Re: [RFC PATCH 0/8] Convert avocado tests to normal Python unittests
On Wed, Jul 17, 2024 at 9:32 AM Thomas Huth wrote: > > There is the pycotap dependency to produce TAP from pytest, but that's > > probably something small enough to be vendored. > > The next version is only depending on pycotap now. I'm installing it in the > venv there that we also install when running the old avocado tests. Not sure > whether that's the best solution, though. Would it be OK to have it in > python/wheels/ instead? Yes, and you can probably move it to the same group as meson; it's ridiculously small (5k) and it is indeed used _with_ meson. Then you don't need any change in "configure". Paolo
[PULL 04/20] disas: Fix build against Capstone v6
From: Gustavo Romero Capstone v6 made major changes, such as renaming for AArch64, which broke programs using the old headers, like QEMU. However, Capstone v6 provides the CAPSTONE_AARCH64_COMPAT_HEADER compatibility definition allowing to build against v6 with the old definitions, so fix the QEMU build using it. We can lift that definition and switch to the new naming once our supported distros have Capstone v6 in place. Signed-off-by: Gustavo Romero Suggested-by: Peter Maydell Reviewed-by: Richard Henderson Link: https://lore.kernel.org/r/20240715213943.1210355-1-gustavo.rom...@linaro.org Signed-off-by: Paolo Bonzini --- include/disas/capstone.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/disas/capstone.h b/include/disas/capstone.h index e29068dd977..a11985151d3 100644 --- a/include/disas/capstone.h +++ b/include/disas/capstone.h @@ -3,6 +3,7 @@ #ifdef CONFIG_CAPSTONE +#define CAPSTONE_AARCH64_COMPAT_HEADER #include #else -- 2.45.2
[PULL 08/20] docs: Update description of 'user=username' for '-run-with'
From: Boqiao Fu The description of '-runas' and '-run-with' didn't explain that QEMU will use setuid/setgid to implement the option, so the user might get confused if using 'elevateprivileges=deny' as well. Since '-runas' is going to be deprecated and replaced by '-run-with' in the coming qemu9.1, add the message there. Signed-off-by: Boqiao Fu Link: https://lore.kernel.org/r/cafrhj6j9umk+hmzl+w+ke1yorcolpgbpuvvdku55sdxyigx...@mail.gmail.com Signed-off-by: Paolo Bonzini --- qemu-options.hx | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index ad6521ef5e7..694fa37f284 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -5024,8 +5024,11 @@ SRST in combination with -runas. ``user=username`` or ``user=uid:gid`` can be used to drop root privileges -by switching to the specified user (via username) or user and group -(via uid:gid) immediately before starting guest execution. +before starting guest execution. QEMU will use the ``setuid`` and ``setgid`` +system calls to switch to the specified identity. Note that the +``user=username`` syntax will also apply the full set of supplementary +groups for the user, whereas the ``user=uid:gid`` will use only the +``gid`` group. ERST #endif -- 2.45.2
[PULL 02/20] Revert "qemu-char: do not operate on sources from finalize callbacks"
From: Sergey Dyasli This reverts commit 2b316774f60291f57ca9ecb6a9f0712c532cae34. After 038b4217884c ("Revert "chardev: use a child source for qio input source"") we've been observing the "iwp->src == NULL" assertion triggering periodically during the initial capabilities querying by libvirtd. One of possible backtraces: Thread 1 (Thread 0x7f16cd4f0700 (LWP 43858)): 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x7f16c6c21e65 in __GI_abort () at abort.c:79 2 0x7f16c6c21d39 in __assert_fail_base at assert.c:92 3 0x7f16c6c46e86 in __GI___assert_fail (assertion=assertion@entry=0x562e9bcdaadd "iwp->src == NULL", file=file@entry=0x562e9bcdaac8 "../chardev/char-io.c", line=line@entry=99, function=function@entry=0x562e9bcdab10 <__PRETTY_FUNCTION__.20549> "io_watch_poll_finalize") at assert.c:101 4 0x562e9ba20c2c in io_watch_poll_finalize (source=) at ../chardev/char-io.c:99 5 io_watch_poll_finalize (source=) at ../chardev/char-io.c:88 6 0x7f16c904aae0 in g_source_unref_internal () from /lib64/libglib-2.0.so.0 7 0x7f16c904baf9 in g_source_destroy_internal () from /lib64/libglib-2.0.so.0 8 0x562e9ba20db0 in io_remove_watch_poll (source=0x562e9d6720b0) at ../chardev/char-io.c:147 9 remove_fd_in_watch (chr=chr@entry=0x562e9d5f3800) at ../chardev/char-io.c:153 10 0x562e9ba23ffb in update_ioc_handlers (s=0x562e9d5f3800) at ../chardev/char-socket.c:592 11 0x562e9ba2072f in qemu_chr_fe_set_handlers_full at ../chardev/char-fe.c:279 12 0x562e9ba207a9 in qemu_chr_fe_set_handlers at ../chardev/char-fe.c:304 13 0x562e9ba2ca75 in monitor_qmp_setup_handlers_bh (opaque=0x562e9d4c2c60) at ../monitor/qmp.c:509 14 0x562e9bb6222e in aio_bh_poll (ctx=ctx@entry=0x562e9d4c2f20) at ../util/async.c:216 15 0x562e9bb4de0a in aio_poll (ctx=0x562e9d4c2f20, blocking=blocking@entry=true) at ../util/aio-posix.c:722 16 0x562e9b99dfaa in iothread_run (opaque=0x562e9d4c26f0) at ../iothread.c:63 17 0x562e9bb505a4 in qemu_thread_start (args=0x562e9d4c7ea0) at ../util/qemu-thread-posix.c:543 18 0x7f16c70081ca in start_thread (arg=) at pthread_create.c:479 19 0x7f16c6c398d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 io_remove_watch_poll(), which makes sure that iwp->src is NULL, calls g_source_destroy() which finds that iwp->src is not NULL in the finalize callback. This can only happen if another thread has managed to trigger io_watch_poll_prepare() callback in the meantime. Move iwp->src destruction back to the finalize callback to prevent the described race, and also remove the stale comment. The deadlock glib bug was fixed back in 2010 by b35820285668 ("gmain: move finalization of GSource outside of context lock"). Suggested-by: Paolo Bonzini Signed-off-by: Sergey Dyasli Link: https://lore.kernel.org/r/20240712092659.216206-1-sergey.dya...@nutanix.com Signed-off-by: Paolo Bonzini --- chardev/char-io.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/chardev/char-io.c b/chardev/char-io.c index dab77b112e3..3be17b51ca5 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -87,16 +87,12 @@ static gboolean io_watch_poll_dispatch(GSource *source, GSourceFunc callback, static void io_watch_poll_finalize(GSource *source) { -/* - * Due to a glib bug, removing the last reference to a source - * inside a finalize callback causes recursive locking (and a - * deadlock). This is not a problem inside other callbacks, - * including dispatch callbacks, so we call io_remove_watch_poll - * to remove this source. At this point, iwp->src must - * be NULL, or we would leak it. - */ IOWatchPoll *iwp = io_watch_poll_from_source(source); -assert(iwp->src == NULL); +if (iwp->src) { +g_source_destroy(iwp->src); +g_source_unref(iwp->src); +iwp->src = NULL; +} } static GSourceFuncs io_watch_poll_funcs = { @@ -139,11 +135,6 @@ static void io_remove_watch_poll(GSource *source) IOWatchPoll *iwp; iwp = io_watch_poll_from_source(source); -if (iwp->src) { -g_source_destroy(iwp->src); -g_source_unref(iwp->src); -iwp->src = NULL; -} g_source_destroy(>parent); } -- 2.45.2
[PULL 16/20] target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl
From: Richard Henderson Disconnect mmu index computation from the current pl as stored in env->hflags. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-2-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 11 ++- target/i386/cpu.c | 27 --- 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index c43ac01c794..1e121acef54 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2445,15 +2445,8 @@ static inline bool is_mmu_index_32(int mmu_index) return mmu_index & 1; } -static inline int cpu_mmu_index_kernel(CPUX86State *env) -{ -int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; -int mmu_index_base = -!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : -((env->hflags & HF_CPL_MASK) < 3 && (env->eflags & AC_MASK)) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; - -return mmu_index_base + mmu_index_32; -} +int x86_mmu_index_pl(CPUX86State *env, unsigned pl); +int cpu_mmu_index_kernel(CPUX86State *env); #define CC_DST (env->cc_dst) #define CC_SRC (env->cc_src) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index c05765eeafc..4688d140c2d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8122,18 +8122,39 @@ static bool x86_cpu_has_work(CPUState *cs) return x86_cpu_pending_interrupt(cs, cs->interrupt_request) != 0; } -static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +int x86_mmu_index_pl(CPUX86State *env, unsigned pl) { -CPUX86State *env = cpu_env(cs); int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 0 : 1; int mmu_index_base = -(env->hflags & HF_CPL_MASK) == 3 ? MMU_USER64_IDX : +pl == 3 ? MMU_USER64_IDX : !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : (env->eflags & AC_MASK) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; return mmu_index_base + mmu_index_32; } +static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +{ +CPUX86State *env = cpu_env(cs); +return x86_mmu_index_pl(env, env->hflags & HF_CPL_MASK); +} + +static int x86_mmu_index_kernel_pl(CPUX86State *env, unsigned pl) +{ +int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; +int mmu_index_base = +!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : +(pl < 3 && (env->eflags & AC_MASK) + ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX); + +return mmu_index_base + mmu_index_32; +} + +int cpu_mmu_index_kernel(CPUX86State *env) +{ +return x86_mmu_index_kernel_pl(env, env->hflags & HF_CPL_MASK); +} + static void x86_disas_set_info(CPUState *cs, disassemble_info *info) { X86CPU *cpu = X86_CPU(cs); -- 2.45.2
[PULL 10/20] hpet: fix HPET_TN_SETVAL for high 32-bits of the comparator
Commit 3787324101b ("hpet: Fix emulation of HPET_TN_SETVAL (Jan Kiszka)", 2009-04-17) applied the fix only to the low 32-bits of the comparator, but it should be done for the high bits as well. Otherwise, the high 32-bits of the comparator cannot be written and they remain fixed to 0x. Co-developed-by: TaiseiIto Signed-off-by: TaiseiIto Signed-off-by: Paolo Bonzini --- hw/timer/hpet.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index ad881448bf3..4cb5393c0b5 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -554,6 +554,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr, timer->period = (timer->period & 0xULL) | new_val; } +/* + * FIXME: on a 64-bit write, HPET_TN_SETVAL should apply to the + * high bits part as well. + */ timer->config &= ~HPET_TN_SETVAL; if (hpet_enabled(s)) { hpet_set_timer(timer); @@ -564,7 +568,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr, if (!timer_is_periodic(timer) || (timer->config & HPET_TN_SETVAL)) { timer->cmp = (timer->cmp & 0xULL) | new_val << 32; -} else { +} +if (timer_is_periodic(timer)) { /* * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value @@ -572,12 +577,12 @@ static void hpet_ram_write(void *opaque, hwaddr addr, new_val = MIN(new_val, ~0u >> 1); timer->period = (timer->period & 0xULL) | new_val << 32; -} -timer->config &= ~HPET_TN_SETVAL; -if (hpet_enabled(s)) { -hpet_set_timer(timer); -} -break; +} +timer->config &= ~HPET_TN_SETVAL; +if (hpet_enabled(s)) { +hpet_set_timer(timer); +} +break; case HPET_TN_ROUTE: timer->fsb = (timer->fsb & 0xULL) | new_val; break; -- 2.45.2
[PULL 15/20] target/i386/tcg: Reorg push/pop within seg_helper.c
From: Richard Henderson Interrupts and call gates should use accesses with the DPL as the privilege level. While computing the applicable MMU index is easy, the harder thing is how to plumb it in the code. One possibility could be to add a single argument to the PUSH* macros for the privilege level, but this is repetitive and risks confusion between the involved privilege levels. Another possibility is to pass both CPL and DPL, and adjusting both PUSH* and POP* to use specific privilege levels (instead of using cpu_{ld,st}*_data). This makes the code more symmetric. However, a more complicated but much nicer approach is to use a structure to contain the stack parameters, env, unwind return address, and rewrite the macros into functions. The struct provides an easy home for the MMU index as well. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-4-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 481 +++ 1 file changed, 259 insertions(+), 222 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index b985382d704..b6902ca3fba 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -28,6 +28,68 @@ #include "helper-tcg.h" #include "seg_helper.h" +#ifdef TARGET_X86_64 +#define SET_ESP(val, sp_mask) \ +do {\ +if ((sp_mask) == 0x) { \ +env->regs[R_ESP] = (env->regs[R_ESP] & ~0x) | \ +((val) & 0x); \ +} else if ((sp_mask) == 0xLL) { \ +env->regs[R_ESP] = (uint32_t)(val); \ +} else {\ +env->regs[R_ESP] = (val); \ +} \ +} while (0) +#else +#define SET_ESP(val, sp_mask) \ +do {\ +env->regs[R_ESP] = (env->regs[R_ESP] & ~(sp_mask)) |\ +((val) & (sp_mask));\ +} while (0) +#endif + +/* XXX: use mmu_index to have proper DPL support */ +typedef struct StackAccess +{ +CPUX86State *env; +uintptr_t ra; +target_ulong ss_base; +target_ulong sp; +target_ulong sp_mask; +} StackAccess; + +static void pushw(StackAccess *sa, uint16_t val) +{ +sa->sp -= 2; +cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} + +static void pushl(StackAccess *sa, uint32_t val) +{ +sa->sp -= 4; +cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} + +static uint16_t popw(StackAccess *sa) +{ +uint16_t ret = cpu_lduw_data_ra(sa->env, +sa->ss_base + (sa->sp & sa->sp_mask), +sa->ra); +sa->sp += 2; +return ret; +} + +static uint32_t popl(StackAccess *sa) +{ +uint32_t ret = cpu_ldl_data_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->ra); +sa->sp += 4; +return ret; +} + int get_pg_mode(CPUX86State *env) { int pg_mode = 0; @@ -559,68 +621,19 @@ int exception_has_error_code(int intno) return 0; } -#ifdef TARGET_X86_64 -#define SET_ESP(val, sp_mask) \ -do {\ -if ((sp_mask) == 0x) { \ -env->regs[R_ESP] = (env->regs[R_ESP] & ~0x) | \ -((val) & 0x); \ -} else if ((sp_mask) == 0xLL) { \ -env->regs[R_ESP] = (uint32_t)(val); \ -} else {\ -env->regs[R_ESP] = (val); \ -} \ -} while (0) -#else -#define SET_ESP(val, sp_mask) \ -do {\ -env->regs[R_ESP] = (env->regs[R_ESP] & ~(sp_mask)) |\ -((val) & (sp_mask));\ -} while (0) -#endif - -/* XXX: add a is_user flag to have proper security support */ -#define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ -{\ -sp -= 2
[PULL 03/20] cpu: Free queued CPU work
From: Akihiko Odaki Running qemu-system-aarch64 -M virt -nographic and terminating it will result in a LeakSanitizer error due to remaining queued CPU work so free it. Signed-off-by: Akihiko Odaki Link: https://lore.kernel.org/r/20240714-cpu-v1-1-19c2f8de2...@daynix.com Signed-off-by: Paolo Bonzini --- include/hw/core/cpu.h | 6 ++ cpu-common.c | 11 +++ hw/core/cpu-common.c | 1 + 3 files changed, 18 insertions(+) diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index a2c8536943f..8e6466c1dda 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -1000,6 +1000,12 @@ void cpu_resume(CPUState *cpu); */ void cpu_remove_sync(CPUState *cpu); +/** + * free_queued_cpu_work() - free all items on CPU work queue + * @cpu: The CPU which work queue to free. + */ +void free_queued_cpu_work(CPUState *cpu); + /** * process_queued_cpu_work() - process all items on CPU work queue * @cpu: The CPU which work queue to process. diff --git a/cpu-common.c b/cpu-common.c index ce78273af59..7ae136f98ca 100644 --- a/cpu-common.c +++ b/cpu-common.c @@ -331,6 +331,17 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, queue_work_on_cpu(cpu, wi); } +void free_queued_cpu_work(CPUState *cpu) +{ +while (!QSIMPLEQ_EMPTY(>work_list)) { +struct qemu_work_item *wi = QSIMPLEQ_FIRST(>work_list); +QSIMPLEQ_REMOVE_HEAD(>work_list, node); +if (wi->free) { +g_free(wi); +} +} +} + void process_queued_cpu_work(CPUState *cpu) { struct qemu_work_item *wi; diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c index b19e1fdacf2..d2e3e4570ab 100644 --- a/hw/core/cpu-common.c +++ b/hw/core/cpu-common.c @@ -281,6 +281,7 @@ static void cpu_common_finalize(Object *obj) g_free(cpu->plugin_state); } #endif +free_queued_cpu_work(cpu); g_array_free(cpu->gdb_regs, TRUE); qemu_lockcnt_destroy(>in_ioctl_lock); qemu_mutex_destroy(>work_mutex); -- 2.45.2
[PULL 13/20] target/i386/tcg: Allow IRET from user mode to user mode with SMAP
This fixes a bug wherein i386/tcg assumed an interrupt return using the IRET instruction was always returning from kernel mode to either kernel mode or user mode. This assumption is violated when IRET is used as a clever way to restore thread state, as for example in the dotnet runtime. There, IRET returns from user mode to user mode. This bug is that stack accesses from IRET and RETF, as well as accesses to the parameters in a call gate, are normal data accesses using the current CPL. This manifested itself as a page fault in the guest Linux kernel due to SMAP preventing the access. This bug appears to have been in QEMU since the beginning. Analyzed-by: Robert R. Henry Co-developed-by: Robert R. Henry Signed-off-by: Robert R. Henry Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 19d6b41a589..224e73e9ed0 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -594,13 +594,13 @@ int exception_has_error_code(int intno) #define POPW_RA(ssp, sp, sp_mask, val, ra) \ {\ -val = cpu_lduw_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = cpu_lduw_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 2; \ } #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = (uint32_t)cpu_ldl_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } @@ -847,7 +847,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, #define POPQ_RA(sp, val, ra)\ { \ -val = cpu_ldq_kernel_ra(env, sp, ra); \ +val = cpu_ldq_data_ra(env, sp, ra); \ sp += 8;\ } @@ -1797,18 +1797,18 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, PUSHL_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHL_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_ldl_kernel_ra(env, old_ssp + -((env->regs[R_ESP] + i * 4) & - old_sp_mask), GETPC()); +val = cpu_ldl_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 4) & old_sp_mask), + GETPC()); PUSHL_RA(ssp, sp, sp_mask, val, GETPC()); } } else { PUSHW_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHW_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_lduw_kernel_ra(env, old_ssp + - ((env->regs[R_ESP] + i * 2) & - old_sp_mask), GETPC()); +val = cpu_lduw_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 2) & old_sp_mask), + GETPC()); PUSHW_RA(ssp, sp, sp_mask, val, GETPC()); } } -- 2.45.2
[PULL 07/20] qemu/timer: Add host ticks function for LoongArch
From: Song Gao Signed-off-by: Song Gao Link: https://lore.kernel.org/r/20240716031500.4193498-1-gaos...@loongson.cn Signed-off-by: Paolo Bonzini --- include/qemu/timer.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 5ce83c79112..fa56ec9481d 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -1016,6 +1016,15 @@ static inline int64_t cpu_get_host_ticks(void) return val; } +#elif defined(__loongarch64) +static inline int64_t cpu_get_host_ticks(void) +{ +uint64_t val; + +asm volatile("rdtime.d %0, $zero" : "=r"(val)); +return val; +} + #else /* The host CPU doesn't have an easily accessible cycle counter. Just return a monotonically increasing value. This will be -- 2.45.2
[PULL 11/20] target/i386/tcg: fix POP to memory in long mode
In long mode, POP to memory will write a full 64-bit value. However, the call to gen_writeback() in gen_POP will use MO_32 because the decoding table is incorrect. The bug was latent until commit aea49fbb01a ("target/i386: use gen_writeback() within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v now receives op->ot instead of the "ot" returned by gen_pop_T0. Analyzed-by: Clément Chigot Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07) Tested-by: Clément Chigot Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 0d846c32c22..d2da1d396d5 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -1717,7 +1717,7 @@ static const X86OpEntry opcodes_root[256] = { [0x8C] = X86_OP_ENTRYwr(MOV, E,v, S,w, op0_Mw), [0x8D] = X86_OP_ENTRYwr(LEA, G,v, M,v, nolea), [0x8E] = X86_OP_ENTRYwr(MOV, S,w, E,w), -[0x8F] = X86_OP_GROUPw(group1A, E,v), +[0x8F] = X86_OP_GROUPw(group1A, E,d64), [0x98] = X86_OP_ENTRY1(CBW,0,v), /* rAX */ [0x99] = X86_OP_ENTRYwr(CWD, 2,v, 0,v), /* rDX, rAX */ diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index fc7477833bc..016dce81464 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -2788,6 +2788,7 @@ static void gen_POP(DisasContext *s, X86DecodedInsn *decode) X86DecodedOp *op = >op[0]; MemOp ot = gen_pop_T0(s); +assert(ot >= op->ot); if (op->has_ea || op->unit == X86_OP_SEG) { /* NOTE: order is important for MMU exceptions */ gen_writeback(s, decode, 0, s->T0); -- 2.45.2
[PULL 12/20] target/i386/tcg: Remove SEG_ADDL
From: Richard Henderson This truncation is now handled by MMU_*32_IDX. The introduction of MMU_*32_IDX in fact applied correct 32-bit wraparound to 16-bit accesses with a high segment base (e.g. big real mode or vm86 mode), which did not use SEG_ADDL. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-3-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index aee3d19f29b..19d6b41a589 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -579,10 +579,6 @@ int exception_has_error_code(int intno) } while (0) #endif -/* in 64-bit machines, this can overflow. So this segment addition macro - * can be used to trim the value to 32-bit whenever needed */ -#define SEG_ADDL(ssp, sp, sp_mask) ((uint32_t)((ssp) + (sp & (sp_mask - /* XXX: add a is_user flag to have proper security support */ #define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ {\ @@ -593,7 +589,7 @@ int exception_has_error_code(int intno) #define PUSHL_RA(ssp, sp, sp_mask, val, ra) \ { \ sp -= 4;\ -cpu_stl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), (uint32_t)(val), ra); \ +cpu_stl_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \ } #define POPW_RA(ssp, sp, sp_mask, val, ra) \ @@ -604,7 +600,7 @@ int exception_has_error_code(int intno) #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), ra); \ +val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } -- 2.45.2
[PULL 18/20] target/i386/tcg: check for correct busy state before switching to a new task
This step is listed in the Intel manual: "Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return)". The AMD manual lists the same operation under the "Preventing recursion" paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor checks the busy bit in the IRET case. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 8a6d92b3583..a5d5ce61f59 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -369,6 +369,11 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, old_tss_limit_max = 43; } +/* new TSS must be busy iff the source is an IRET instruction */ +if (!!(e2 & DESC_TSS_BUSY_MASK) != (source == SWITCH_TSS_IRET)) { +raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -- 2.45.2
[PULL 20/20] target/i386/tcg: save current task state before loading new one
This is how the steps are ordered in the manual. EFLAGS.NT is overwritten after the fact in the saved image. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 85 +++- 1 file changed, 45 insertions(+), 40 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 36d2f089cae..aac092a356b 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -389,6 +389,42 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, access_prepare_mmu(, env, tss_base, tss_limit, MMU_DATA_LOAD, mmu_index, retaddr); +/* save the current state in the old TSS */ +old_eflags = cpu_compute_eflags(env); +if (old_type & 8) { +/* 32 bit */ +access_stl(, env->tr.base + 0x20, next_eip); +access_stl(, env->tr.base + 0x24, old_eflags); +access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); +access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); +access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); +access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); +access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); +access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); +access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); +access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); +for (i = 0; i < 6; i++) { +access_stw(, env->tr.base + (0x48 + i * 4), + env->segs[i].selector); +} +} else { +/* 16 bit */ +access_stw(, env->tr.base + 0x0e, next_eip); +access_stw(, env->tr.base + 0x10, old_eflags); +access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); +access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); +access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); +access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); +access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); +access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); +access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]); +access_stw(, env->tr.base + (0x12 + 7 * 2), env->regs[R_EDI]); +for (i = 0; i < 4; i++) { +access_stw(, env->tr.base + (0x22 + i * 2), + env->segs[i].selector); +} +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ @@ -428,49 +464,16 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, if (source == SWITCH_TSS_JMP || source == SWITCH_TSS_IRET) { tss_set_busy(env, env->tr.selector, 0, retaddr); } -old_eflags = cpu_compute_eflags(env); + if (source == SWITCH_TSS_IRET) { old_eflags &= ~NT_MASK; +if (old_type & 8) { +access_stl(, env->tr.base + 0x24, old_eflags); +} else { +access_stw(, env->tr.base + 0x10, old_eflags); + } } -/* save the current state in the old TSS */ -if (old_type & 8) { -/* 32 bit */ -access_stl(, env->tr.base + 0x20, next_eip); -access_stl(, env->tr.base + 0x24, old_eflags); -access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); -access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); -access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); -access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); -access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); -access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); -access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); -access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); -for (i = 0; i < 6; i++) { -access_stw(, env->tr.base + (0x48 + i * 4), - env->segs[i].selector); -} -} else { -/* 16 bit */ -access_stw(, env->tr.base + 0x0e, next_eip); -access_stw(, env->tr.base + 0x10, old_eflags); -access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); -access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); -access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); -access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); -access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); -access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); -access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]);
[PULL 01/20] i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT
From: Michael Roth Currently if the 'legacy-vm-type' property of the sev-guest object is 'on', QEMU will attempt to use the newer KVM_SEV_INIT2 kernel interface in conjunction with the newer KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM KVM VM types. This can lead to measurement changes if, for instance, an SEV guest was created on a host that originally had an older kernel that didn't support KVM_SEV_INIT2, but is booted on the same host later on after the host kernel was upgraded. Instead, if legacy-vm-type is 'off', QEMU should fail if the KVM_SEV_INIT2 interface is not provided by the current host kernel. Modify the fallback handling accordingly. In the future, VMSA features and other flags might be added to QEMU which will require legacy-vm-type to be 'off' because they will rely on the newer KVM_SEV_INIT2 interface. It may be difficult to convey to users what values of legacy-vm-type are compatible with which features/options, so as part of this rework, switch legacy-vm-type to a tri-state OnOffAuto option. 'auto' in this case will automatically switch to using the newer KVM_SEV_INIT2, but only if it is required to make use of new VMSA features or other options only available via KVM_SEV_INIT2. Defining 'auto' in this way would avoid inadvertantly breaking compatibility with older kernels since it would only be used in cases where users opt into newer features that are only available via KVM_SEV_INIT2 and newer kernels, and provide better default behavior than the legacy-vm-type=off behavior that was previously in place, so make it the default for 9.1+ machine types. Cc: Daniel P. Berrangé Cc: Paolo Bonzini cc: k...@vger.kernel.org Signed-off-by: Michael Roth Reviewed-by: Daniel P. Berrangé Link: https://lore.kernel.org/r/20240710041005.83720-1-michael.r...@amd.com Signed-off-by: Paolo Bonzini --- qapi/qom.json | 18 ++ hw/i386/pc.c | 2 +- target/i386/sev.c | 87 +++ 3 files changed, 84 insertions(+), 23 deletions(-) diff --git a/qapi/qom.json b/qapi/qom.json index 8e75a419c30..7eccd2e14e2 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -924,12 +924,16 @@ # @handle: SEV firmware handle (default: 0) # # @legacy-vm-type: Use legacy KVM_SEV_INIT KVM interface for creating the VM. -# The newer KVM_SEV_INIT2 interface syncs additional vCPU -# state when initializing the VMSA structures, which will -# result in a different guest measurement. Set this to -# maintain compatibility with older QEMU or kernel versions -# that rely on legacy KVM_SEV_INIT behavior. -# (default: false) (since 9.1) +# The newer KVM_SEV_INIT2 interface, from Linux >= 6.10, syncs +# additional vCPU state when initializing the VMSA structures, +# which will result in a different guest measurement. Set +# this to 'on' to force compatibility with older QEMU or kernel +# versions that rely on legacy KVM_SEV_INIT behavior. 'auto' +# will behave identically to 'on', but will automatically +# switch to using KVM_SEV_INIT2 if the user specifies any +# additional options that require it. If set to 'off', QEMU +# will require KVM_SEV_INIT2 unconditionally. +# (default: off) (since 9.1) # # Since: 2.12 ## @@ -939,7 +943,7 @@ '*session-file': 'str', '*policy': 'uint32', '*handle': 'uint32', -'*legacy-vm-type': 'bool' } } +'*legacy-vm-type': 'OnOffAuto' } } ## # @SevSnpGuestProperties: diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 4fbc5774708..c74931d577a 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -83,7 +83,7 @@ GlobalProperty pc_compat_9_0[] = { { TYPE_X86_CPU, "x-amd-topoext-features-only", "false" }, { TYPE_X86_CPU, "x-l1-cache-per-thread", "false" }, { TYPE_X86_CPU, "guest-phys-bits", "0" }, -{ "sev-guest", "legacy-vm-type", "true" }, +{ "sev-guest", "legacy-vm-type", "on" }, { TYPE_X86_CPU, "legacy-multi-node", "on" }, }; const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0); diff --git a/target/i386/sev.c b/target/i386/sev.c index 2ba5f517228..a1157c0ede6 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -144,7 +144,7 @@ struct SevGuestState { uint32_t policy; char *dh_cert_file; char *session_file; -bool legacy_vm_type; +OnOffAuto legacy_vm_type; }; struct SevSnpGuestState { @@ -1369,6 +1369,17 @@ sev_vm_state_change(void *opaque, bool running, RunState state) } } +/* + * This helper is to examine sev-guest properties and determine if any options + * have been set which rely on the newer KVM_SEV_INIT2
[PULL 17/20] target/i386/tcg: Compute MMU index once
Add the MMU index to the StackAccess struct, so that it can be cached or (in the next patch) computed from information that is not in CPUX86State. Co-developed-by: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index b6902ca3fba..8a6d92b3583 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -56,36 +56,37 @@ typedef struct StackAccess target_ulong ss_base; target_ulong sp; target_ulong sp_mask; +int mmu_index; } StackAccess; static void pushw(StackAccess *sa, uint16_t val) { sa->sp -= 2; -cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stw_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static void pushl(StackAccess *sa, uint32_t val) { sa->sp -= 4; -cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stl_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static uint16_t popw(StackAccess *sa) { -uint16_t ret = cpu_lduw_data_ra(sa->env, -sa->ss_base + (sa->sp & sa->sp_mask), -sa->ra); +uint16_t ret = cpu_lduw_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 2; return ret; } static uint32_t popl(StackAccess *sa) { -uint32_t ret = cpu_ldl_data_ra(sa->env, - sa->ss_base + (sa->sp & sa->sp_mask), - sa->ra); +uint32_t ret = cpu_ldl_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 4; return ret; } @@ -677,6 +678,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); if (type == 5) { /* task gate */ @@ -858,12 +860,12 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, static void pushq(StackAccess *sa, uint64_t val) { sa->sp -= 8; -cpu_stq_kernel_ra(sa->env, sa->sp, val, sa->ra); +cpu_stq_mmuidx_ra(sa->env, sa->sp, val, sa->mmu_index, sa->ra); } static uint64_t popq(StackAccess *sa) { -uint64_t ret = cpu_ldq_data_ra(sa->env, sa->sp, sa->ra); +uint64_t ret = cpu_ldq_mmuidx_ra(sa->env, sa->sp, sa->mmu_index, sa->ra); sa->sp += 8; return ret; } @@ -982,6 +984,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); sa.sp_mask = -1; sa.ss_base = 0; if (dpl < cpl || ist != 0) { @@ -1116,6 +1119,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int, sa.sp = env->regs[R_ESP]; sa.sp_mask = 0x; sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (is_int) { old_eip = next_eip; @@ -1579,6 +1583,7 @@ void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip, sa.sp = env->regs[R_ESP]; sa.sp_mask = get_sp_mask(env->segs[R_SS].flags); sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (shift) { pushl(, env->segs[R_CS].selector); @@ -1618,6 +1623,7 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, sa.env = env; sa.ra = GETPC(); +sa.mmu_index = cpu_mmu_index_kernel(env); if (e2 & DESC_S_MASK) { if (!(e2 & DESC_CS_MASK)) { @@ -1905,6 +1911,7 @@ void helper_iret_real(CPUX86State *env, int shift) sa.env = env; sa.ra = GETPC(); +sa.mmu_index = x86_mmu_index_pl(env, 0); sa.sp_mask = 0x; /* : use SS segment size? */ sa.sp = env->regs[R_ESP]; sa.ss_base = env->segs[R_SS].base; @@ -1976,8 +1983,11 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, target_ulong new_eip, new_esp; StackAccess sa; +cpl = env->hflags & HF_CPL_MASK; + sa.env = env; sa.ra = retaddr; +sa.mmu_index = x86_mmu_index_pl(env, cpl); #ifdef TARGET_X86_64 if (shift == 2) { @@ -2032,7 +2042,6 @@
[PULL 06/20] scsi: fix regression and honor bootindex again for legacy drives
From: Fiona Ebner Commit 3089637461 ("scsi: Don't ignore most usb-storage properties") removed the call to object_property_set_int() and thus the 'set' method for the bootindex property was also not called anymore. Here that method is device_set_bootindex() (as configured by scsi_dev_instance_init() -> device_add_bootindex_property()) which as a side effect registers the device via add_boot_device_path(). As reported by a downstream user [0], the bootindex property did not have the desired effect anymore for legacy drives. Fix the regression by explicitly calling the add_boot_device_path() function after checking that the bootindex is not yet used (to avoid add_boot_device_path() calling exit()). [0]: https://forum.proxmox.com/threads/149772/post-679433 Cc: qemu-sta...@nongnu.org Fixes: 3089637461 ("scsi: Don't ignore most usb-storage properties") Suggested-by: Kevin Wolf Signed-off-by: Fiona Ebner Link: https://lore.kernel.org/r/20240710152529.1737407-1-f.eb...@proxmox.com Signed-off-by: Paolo Bonzini --- hw/scsi/scsi-bus.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index 9e40b0c920b..53eff5dd3d6 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -384,6 +384,7 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk, DeviceState *dev; SCSIDevice *s; DriveInfo *dinfo; +Error *local_err = NULL; if (blk_is_sg(blk)) { driver = "scsi-generic"; @@ -403,6 +404,14 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk, s = SCSI_DEVICE(dev); s->conf = *conf; +check_boot_index(conf->bootindex, _err); +if (local_err) { +object_unparent(OBJECT(dev)); +error_propagate(errp, local_err); +return NULL; +} +add_boot_device_path(conf->bootindex, dev, NULL); + qdev_prop_set_uint32(dev, "scsi-id", unit); if (object_property_find(OBJECT(dev), "removable")) { qdev_prop_set_bit(dev, "removable", removable); -- 2.45.2
[PULL 14/20] target/i386/tcg: use PUSHL/PUSHW for error code
Do not pre-decrement esp, let the macros subtract the appropriate operand size. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 224e73e9ed0..b985382d704 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -670,22 +670,20 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, } shift = switch_tss(env, intno * 8, e1, e2, SWITCH_TSS_CALL, old_eip); if (has_error_code) { -uint32_t mask; - /* push the error code */ if (env->segs[R_SS].flags & DESC_B_MASK) { -mask = 0x; +sp_mask = 0x; } else { -mask = 0x; +sp_mask = 0x; } -esp = (env->regs[R_ESP] - (2 << shift)) & mask; -ssp = env->segs[R_SS].base + esp; +esp = env->regs[R_ESP]; +ssp = env->segs[R_SS].base; if (shift) { -cpu_stl_kernel(env, ssp, error_code); +PUSHL(ssp, esp, sp_mask, error_code); } else { -cpu_stw_kernel(env, ssp, error_code); +PUSHW(ssp, esp, sp_mask, error_code); } -SET_ESP(esp, mask); +SET_ESP(esp, sp_mask); } return; } -- 2.45.2
[PULL 09/20] hpet: fix clamping of period
When writing a new period, the clamping should use a maximum value rather tyhan a bit mask. Also, when writing the high bits new_val is shifted right by 32, so the maximum allowed period should also be shifted right. Signed-off-by: Paolo Bonzini --- hw/timer/hpet.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index 01efe4885db..ad881448bf3 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -548,7 +548,9 @@ static void hpet_ram_write(void *opaque, hwaddr addr, * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +if (timer->config & HPET_TN_32BIT) { +new_val = MIN(new_val, ~0u >> 1); +} timer->period = (timer->period & 0xULL) | new_val; } @@ -567,7 +569,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr, * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +new_val = MIN(new_val, ~0u >> 1); timer->period = (timer->period & 0xULL) | new_val << 32; } -- 2.45.2
[PULL 19/20] target/i386/tcg: use X86Access for TSS access
This takes care of probing the vaddr range in advance, and is also faster because it avoids repeated TLB lookups. It also matches the Intel manual better, as it says "Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory"; note however that it's not clear how the processor checks for segment descriptors, and this check is not included in the AMD manual. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 110 ++- 1 file changed, 58 insertions(+), 52 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index a5d5ce61f59..36d2f089cae 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -27,6 +27,7 @@ #include "exec/log.h" #include "helper-tcg.h" #include "seg_helper.h" +#include "access.h" #ifdef TARGET_X86_64 #define SET_ESP(val, sp_mask) \ @@ -313,14 +314,15 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t e1, uint32_t e2, int source, uint32_t next_eip, uintptr_t retaddr) { -int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i; +int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, i; target_ulong tss_base; uint32_t new_regs[8], new_segs[6]; uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; uint32_t old_eflags, eflags_mask; SegmentCache *dt; -int index; +int mmu_index, index; target_ulong ptr; +X86Access old, new; type = (e2 >> DESC_TYPE_SHIFT) & 0xf; LOG_PCALL("switch_tss: sel=0x%04x type=%d src=%d\n", tss_selector, type, @@ -374,35 +376,45 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); } +/* X86Access avoids memory exceptions during the task switch */ +mmu_index = cpu_mmu_index_kernel(env); +access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, + MMU_DATA_STORE, mmu_index, retaddr); + +if (source == SWITCH_TSS_CALL) { +/* Probe for future write of parent task */ +probe_access(env, tss_base, 2, MMU_DATA_STORE, + mmu_index, retaddr); +} +access_prepare_mmu(, env, tss_base, tss_limit, + MMU_DATA_LOAD, mmu_index, retaddr); + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -new_cr3 = cpu_ldl_kernel_ra(env, tss_base + 0x1c, retaddr); -new_eip = cpu_ldl_kernel_ra(env, tss_base + 0x20, retaddr); -new_eflags = cpu_ldl_kernel_ra(env, tss_base + 0x24, retaddr); +new_cr3 = access_ldl(, tss_base + 0x1c); +new_eip = access_ldl(, tss_base + 0x20); +new_eflags = access_ldl(, tss_base + 0x24); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_ldl_kernel_ra(env, tss_base + (0x28 + i * 4), -retaddr); +new_regs[i] = access_ldl(, tss_base + (0x28 + i * 4)); } for (i = 0; i < 6; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x48 + i * 4), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x48 + i * 4)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x60, retaddr); -new_trap = cpu_ldl_kernel_ra(env, tss_base + 0x64, retaddr); +new_ldt = access_ldw(, tss_base + 0x60); +new_trap = access_ldl(, tss_base + 0x64); } else { /* 16 bit */ new_cr3 = 0; -new_eip = cpu_lduw_kernel_ra(env, tss_base + 0x0e, retaddr); -new_eflags = cpu_lduw_kernel_ra(env, tss_base + 0x10, retaddr); +new_eip = access_ldw(, tss_base + 0x0e); +new_eflags = access_ldw(, tss_base + 0x10); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x12 + i * 2), retaddr); +new_regs[i] = access_ldw(, tss_base + (0x12 + i * 2)); } for (i = 0; i < 4; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x22 + i * 2), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x22 + i * 2)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x2a, retaddr); +new_ldt = access_ldw(, tss_base + 0x2a); new_segs[R_FS] = 0; new_segs[R_GS] = 0; new_trap = 0; @@ -412,16 +424,6 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, chapters 12.2.5 and 13.2.4 on how to implement TSS Trap bit */ (void)new_trap; -/* NOTE: we must avoid memory exceptions during the task switch,
[PULL 00/20] i386, bugfix changes for QEMU 9.1 soft freeze
The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f: Merge tag 'python-pull-request' of https://gitlab.com/jsnow/qemu into staging (2024-07-16 06:45:23 +1000) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 6a079f2e68e1832ebca0e7d64bc31ffebde9b2dd: target/i386/tcg: save current task state before loading new one (2024-07-16 18:18:25 +0200) * target/i386/tcg: fixes for seg_helper.c * SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT, but also don't use it by default * scsi: honor bootindex again for legacy drives * hpet, utils, scsi, build, cpu: miscellaneous bugfixes Akihiko Odaki (1): cpu: Free queued CPU work Boqiao Fu (1): docs: Update description of 'user=username' for '-run-with' Fiona Ebner (2): hw/scsi/lsi53c895a: bump instruction limit in scripts processing to fix regression scsi: fix regression and honor bootindex again for legacy drives Gustavo Romero (1): disas: Fix build against Capstone v6 Michael Roth (1): i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT Paolo Bonzini (9): hpet: fix clamping of period hpet: fix HPET_TN_SETVAL for high 32-bits of the comparator target/i386/tcg: fix POP to memory in long mode target/i386/tcg: Allow IRET from user mode to user mode with SMAP target/i386/tcg: use PUSHL/PUSHW for error code target/i386/tcg: Compute MMU index once target/i386/tcg: check for correct busy state before switching to a new task target/i386/tcg: use X86Access for TSS access target/i386/tcg: save current task state before loading new one Richard Henderson (3): target/i386/tcg: Remove SEG_ADDL target/i386/tcg: Reorg push/pop within seg_helper.c target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl Sergey Dyasli (1): Revert "qemu-char: do not operate on sources from finalize callbacks" Song Gao (1): qemu/timer: Add host ticks function for LoongArch qapi/qom.json| 18 +- include/disas/capstone.h | 1 + include/hw/core/cpu.h| 6 + include/qemu/timer.h | 9 + target/i386/cpu.h| 11 +- chardev/char-io.c| 19 +- cpu-common.c | 11 + hw/core/cpu-common.c | 1 + hw/i386/pc.c | 2 +- hw/scsi/lsi53c895a.c | 2 +- hw/scsi/scsi-bus.c | 9 + hw/timer/hpet.c | 25 +- target/i386/cpu.c| 27 +- target/i386/sev.c| 87 - target/i386/tcg/seg_helper.c | 662 +-- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 1 + qemu-options.hx | 7 +- 18 files changed, 535 insertions(+), 365 deletions(-) -- 2.45.2
[PULL 05/20] hw/scsi/lsi53c895a: bump instruction limit in scripts processing to fix regression
From: Fiona Ebner Commit 9876359990 ("hw/scsi/lsi53c895a: add timer to scripts processing") reduced the maximum allowed instruction count by a factor of 100 all the way down to 100. This causes the "Check Point R81.20 Gaia" appliance [0] to fail to boot after fully finishing the installation via the appliance's web interface (there is already one reboot before that). With a limit of 150, the appliance still fails to boot, while with a limit of 200, it works. Bump to 500 to fix the regression and be on the safe side. Originally reported in the Proxmox community forum[1]. [0]: https://support.checkpoint.com/results/download/124397 [1]: https://forum.proxmox.com/threads/149772/post-683459 Cc: qemu-sta...@nongnu.org Fixes: 9876359990 ("hw/scsi/lsi53c895a: add timer to scripts processing") Signed-off-by: Fiona Ebner Acked-by: Sven Schnelle Link: https://lore.kernel.org/r/20240715131403.223239-1-f.eb...@proxmox.com Signed-off-by: Paolo Bonzini --- hw/scsi/lsi53c895a.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c index eb9828dd5ef..f1935e53280 100644 --- a/hw/scsi/lsi53c895a.c +++ b/hw/scsi/lsi53c895a.c @@ -188,7 +188,7 @@ static const char *names[] = { #define LSI_TAG_VALID (1 << 16) /* Maximum instructions to process. */ -#define LSI_MAX_INSN100 +#define LSI_MAX_INSN500 typedef struct lsi_request { SCSIRequest *req; -- 2.45.2
Re: [RFC PATCH 0/8] Convert avocado tests to normal Python unittests
Il mar 16 lug 2024, 20:10 Daniel P. Berrangé ha scritto: > On Tue, Jul 16, 2024 at 08:03:54PM +0200, Paolo Bonzini wrote: > > Il mar 16 lug 2024, 18:45 John Snow ha scritto: > > > > > My only ask is that we keep the tests running in the custom venv > > > environment we set up at build time > > > > > > > Yes, they do, however pytest should also be added to pythondeps.toml if > we > > go this way. > > Done in this patch: > > https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg03596.html That adds pycotap, not pytest. > Yep, that's the part that I am a bit more doubtful about. > > Pulling & caching VM images isn't much more than a URL download to > a local file, not very complex in python. Assuming that's what you > are refering to, then it is already done in this patch: > > https://lists.nongnu.org/archive/html/qemu-devel/2024-07/msg03598.html I think there are also compressed assets that have to be passed through gzip/xzip/zstd. I am worried that Thomas's patches do 90% of the job but that is not a good estimation of what's left. Paolo > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org-o- > https://www.instagram.com/dberrange :| > >
Re: [RFC PATCH 0/8] Convert avocado tests to normal Python unittests
Il mar 16 lug 2024, 18:45 John Snow ha scritto: > My only ask is that we keep the tests running in the custom venv > environment we set up at build time > Yes, they do, however pytest should also be added to pythondeps.toml if we go this way. If we move to pytest, it's possible we can eliminate that funkiness, which > would be a win. > There is the pycotap dependency to produce TAP from pytest, but that's probably something small enough to be vendored. And also it depends on what the dependencies would be for the assets framework. I'm also not so sure about recreating all of the framework that pulls vm > images on demand, that sounds like it'd be a lot of work, but maybe I'm > wrong about that. > Yep, that's the part that I am a bit more doubtful about. Paolo Tacit ACK from me on this project in general, provided we are still using > the configure venv. > > >> Thomas >> >> >> Ani Sinha (1): >> tests/pytest: add pytest to the meson build system >> >> Thomas Huth (7): >> tests/pytest: Add base classes for the upcoming pytest-based tests >> tests/pytest: Convert some simple avocado tests into pytests >> tests/pytest: Convert info_usernet and version test with small >> adjustments >> tests_pytest: Implement fetch_asset() method for downloading assets >> tests/pytest: Convert some tests that download files via fetch_asset() >> tests/pytest: Add a function for extracting files from an archive >> tests/pytest: Convert avocado test that needed avocado.utils.archive >> >> tests/Makefile.include| 4 +- >> tests/meson.build | 1 + >> tests/pytest/meson.build | 74 >> tests/pytest/qemu_pytest/__init__.py | 362 ++ >> tests/pytest/qemu_pytest/utils.py | 21 + >> .../test_arm_canona1100.py} | 16 +- >> .../test_cpu_queries.py} | 2 +- >> .../test_empty_cpu_model.py} | 2 +- >> .../test_info_usernet.py} | 6 +- >> .../test_machine_arm_n8x0.py} | 20 +- >> .../test_machine_avr6.py} | 7 +- >> .../test_machine_loongarch.py}| 11 +- >> .../test_machine_mips_loongson3v.py} | 19 +- >> .../test_mem_addr_space.py} | 3 +- >> .../test_ppc_bamboo.py} | 18 +- >> .../version.py => pytest/test_version.py} | 8 +- >> .../test_virtio_version.py} | 2 +- >> 17 files changed, 502 insertions(+), 74 deletions(-) >> create mode 100644 tests/pytest/meson.build >> create mode 100644 tests/pytest/qemu_pytest/__init__.py >> create mode 100644 tests/pytest/qemu_pytest/utils.py >> rename tests/{avocado/machine_arm_canona1100.py => >> pytest/test_arm_canona1100.py} (74%) >> rename tests/{avocado/cpu_queries.py => pytest/test_cpu_queries.py} (96%) >> rename tests/{avocado/empty_cpu_model.py => >> pytest/test_empty_cpu_model.py} (94%) >> rename tests/{avocado/info_usernet.py => pytest/test_info_usernet.py} >> (91%) >> rename tests/{avocado/machine_arm_n8x0.py => >> pytest/test_machine_arm_n8x0.py} (71%) >> rename tests/{avocado/machine_avr6.py => pytest/test_machine_avr6.py} >> (91%) >> rename tests/{avocado/machine_loongarch.py => >> pytest/test_machine_loongarch.py} (89%) >> rename tests/{avocado/machine_mips_loongson3v.py => >> pytest/test_machine_mips_loongson3v.py} (59%) >> rename tests/{avocado/mem-addr-space-check.py => >> pytest/test_mem_addr_space.py} (99%) >> rename tests/{avocado/ppc_bamboo.py => pytest/test_ppc_bamboo.py} (75%) >> rename tests/{avocado/version.py => pytest/test_version.py} (82%) >> rename tests/{avocado/virtio_version.py => >> pytest/test_virtio_version.py} (99%) >> >> -- >> 2.45.2 >> >> >>
Re: [PATCH v1 00/11] Convert avocado tests to normal Python unittests
Il mar 16 lug 2024, 13:26 Thomas Huth ha scritto: > The Avocado v88 that we use in QEMU is already on a life support > system: It is not supported by upstream anymore, and with the latest > versions of Python, it won't work anymore since it depends on the > "imp" module that has been removed in Python 3.12. > > There have been several attempts to update the test suite in QEMU > to a newer version of Avocado, but so far no attempt has successfully > been merged yet. > I think we should take another look at that. Avocado 92 should work, though I am not sure if it's also depending on "imp", and if I recall correctly the problem with more recent version was more that it conflicted with distros that didn't have it packaged. That should also not be a problem anymore with the pythondeps.toml mechanism. Additionally, the whole "make check" test suite in QEMU is using the > meson test runner nowadays, so running the python-based tests via the > Avocodo test runner looks and feels quite like an oddball, requiring > the users to deal with the knowledge of multiple test runners in > parallel (e.g. the timeout settings work completely differently). > This is true. Paolo > So instead of trying to update the python-based test suite in QEMU > to a newer version of Avocado, we should maybe try to better integrate > it with the meson test runner instead. Indeed most tests work quite > nicely without the Avocado framework already, as you can see with > this patch series - it does not convert all tests, just a subset so > far, but this already proves that many tests only need small modifi- > cations to work without Avocado. > > Only tests that use the LinuxTest / LinuxDistro and LinuxSSHMixIn > classes (e.g. based on cloud-init images or using SSH) really depend > on the Avocado framework, so we'd need a solution for those if we > want to continue using them. One solution might be to simply use the > required functions from avocado.utils for these tests, and still run > them via the meson test runner instead, but that needs some further > investigation that will be done later. > > > Now if you want to try out these patches: Apply the patches, then > recompile and then run: > > make check-functional > > You can also run single targets e.g. with: > > make check-functional-ppc > > You can also run the tests without any test runner now by > setting the PYTHONPATH environment variable to the "python" folder > of your source tree, and by specifying the build directory via > QEMU_BUILD_ROOT (if autodetection fails) and by specifying the > QEMU binary via QEMU_TEST_QEMU_BINARY. For example: > > export PYTHONPATH=$HOME/qemu/python > export QEMU_TEST_QEMU_BINARY=qemu-system-x86_64 > export PYTHONPATH=$HOME/qemu/build > ~/qemu/tests/functional/test_virtio_version.py > > The logs of the tests can be found in the build directory under > tests/functional/ - console log and general logs will > be put in separate files there. > > Still to be done: Update the documentation for this new test framework. > > RFC -> v1: > - Now using pycotap for running the tests instead of "pytest" > - Change the name from "tests/pytest" to "tests/functional" accordingly > - Make it possible to run the tests directly > - Use Python's urllib instead of wget for downloading > - Lots of makefile / meson integration improvements > - Converted more tests > - Update MAINTAINERS file accordingly > - Added a patch to run check-functional in the gitlab-CI > - ... lots of other changes I forgot about ... in fact, I changed so > many things that I also did not dare to pick up the Reviewed-bys > from the RFC > > Thomas Huth (11): > tests/functional: Add base classes for the upcoming pytest-based tests > tests/functional: Convert simple avocado tests into standalone python > tests > tests/functional: Convert avocado tests that just need a small > adjustment > tests/functional: Add python-based tests to the meson build system > tests/functional: Implement fetch_asset() method for downloading > assets > tests/functional: Convert some tests that download files via > fetch_asset() > tests/functional: Add a function for extracting files from an archive > tests/functional: Convert some avocado tests that needed > avocado.utils.archive > tests/functional: Set up logging > tests/functional: Convert the s390x avocado tests into standalone > tests > gitlab-ci: Add "check-functional" to the build tests > > MAINTAINERS | 22 +- > .gitlab-ci.d/buildtest-template.yml | 3 +- > .gitlab-ci.d/buildtest.yml| 60 +-- > pythondeps.toml | 3 +- > tests/Makefile.include| 18 +- > tests/functional/meson.build | 112 + > tests/functional/qemu_test/__init__.py| 384 ++ > tests/functional/qemu_test/utils.py | 28 ++ > .../test_arm_canona1100.py}
[PATCH] target/i386: do not crash if microvm guest uses SGX CPUID leaves
sgx_epc_get_section assumes a PC platform is in use: bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size) { PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); However, sgx_epc_get_section is called by CPUID regardless of whether SGX state has been initialized or which platform is in use. Check whether the machine has the right QOM class and if not behave as if there are no EPC sections. Fixes: 1dec2e1f19f ("i386: Update SGX CPUID info according to hardware/KVM/user input", 2021-09-30) Cc: qemu-sta...@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2142 Signed-off-by: Paolo Bonzini --- hw/i386/sgx.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c index de76397bcfb..25b2055d653 100644 --- a/hw/i386/sgx.c +++ b/hw/i386/sgx.c @@ -266,10 +266,12 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict) bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size) { -PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); +PCMachineState *pcms = +(PCMachineState *)object_dynamic_cast(qdev_get_machine(), + TYPE_PC_MACHINE); SGXEPCDevice *epc; -if (pcms->sgx_epc.size == 0 || pcms->sgx_epc.nr_sections <= section_nr) { +if (!pcms || pcms->sgx_epc.size == 0 || pcms->sgx_epc.nr_sections <= section_nr) { return true; } -- 2.45.2
Re: [PATCH v4 0/7] util: Introduce qemu_get_runtime_dir()
On Tue, Jul 16, 2024 at 2:46 PM Akihiko Odaki wrote: > > On 2024/07/16 19:43, Paolo Bonzini wrote: > > On Tue, Jul 16, 2024 at 11:56 AM Daniel P. Berrangé > > wrote: > >> > >> On Tue, Jul 16, 2024 at 11:06:57AM +0300, Michael Tokarev wrote: > >>> 16.07.2024 10:27, Akihiko Odaki wrote: > >>>> qemu_get_runtime_dir() returns a dynamically allocated directory path > >>>> that is appropriate for storing runtime files. It corresponds to "run" > >>>> directory in Unix. > >>> > >>> Since runtime dir is always used with a filename within, how about > >>> > >>>char *qemu_get_runtime_path(const char *filename) > >>> > >>> which return RUNTIME_DIR/filename instead of just RUNTIME_DIR ? > >> > >> Yeah, I agree, every single caller of the function goes on to call > >> g_build_filename with the result. The helper should just be building > >> the filename itself. > > > > That would mean using variable arguments and g_build_filename_valist(). > > We can't prepend an element to va_list. You could do it in two steps, with g_build_filename(runtime_dir, first) followed by g_build_filename_valist(result, ap); doing these steps only if if first != NULL of course. But I agree that leaving the concatenation in the caller is not particularly worse, and makes qemu_get_runtime_dir() more readable. Paolo Paolo
Re: [PATCH v2] hw/timer/hpet: Fix wrong HPET interrupts
On 7/13/24 13:54, TaiseiIto wrote: Before this commit, there are 3 problems about HPET timer interrupts. First, HPET periodic timers cause a too early interrupt before HPET main counter value reaches a value written its comparator value register. Second, disabled HPET timers whose comparator value register is not 0x cause wrong interrupts. Third, enabled HPET timers whose comparator value register is 0x don't cause any interrupts. About the first one, for example, an HPET driver writes 0x to an HPET periodic timer comparator value register. As a result, the register becomes 0x because writing to the higher 32 bits of the register doesn't affect itself in periodic mode. (see "case HPET_TN_CMP + 4" of "hpet_ram_write" function.) And "timer->period" which means interrupt period in periodic mode becomes 0x. Next, the HPET driver sets the HPET_CFG_ENABLE flag to start the main counter. The comparator value register (0x) indicate the next interrupt time. The period (0x) is added to the comparator value register at "hpet_timer" function because "hpet_time_after64" function returns true when the main counter is small. So, the first interrupt is planned when the main counter is 0x5554, but the first interrupt should occur when the main counter is 0x. To solve this problem, I fix "case HPET_TN_CMP + 4" of "hpet_ram_write" function to ensure that writings to higher 32 bits of a comparator value register reflect itself even if in periodic mode. About the other two problems, it was decided by comparator value whether each timer is enabled, but it should be decided by "timer_enabled" function which confirm "HPET_TN_ENABLE" flag. To solve these problems, I fix the code to decide correctly whether each timer is enabled. After this commit, the 3 problems are solved. First, HPET periodic timers cause the first interrupt when the main counter value reaches a value written its comparator value register. Second, disabled HPET timers never cause any interrupt. Third, enabled HPET timers cause interrupts correctly even if an HPET driver writes 0x to its comparator value register. Signed-off-by: TaiseiIto --- Changes in v2: - Reflect writings to higher 32 bits of a comparator value register rather than clearing these bits. - Fix wrong indents. - Link to v1: https://lore.kernel.org/qemu-devel/ty0pr0101mb4285838139bc56dec3d1ccfda4...@ty0pr0101mb4285.apcprd01.prod.exchangelabs.com/ hw/timer/hpet.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index 01efe4885d..4b6352e257 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -552,6 +552,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr, timer->period = (timer->period & 0xULL) | new_val; } +/* + * FIXME: on a 64-bit write, HPET_TN_SETVAL should apply to the + * high bits part as well. + */ timer->config &= ~HPET_TN_SETVAL; if (hpet_enabled(s)) { hpet_set_timer(timer); @@ -562,20 +566,22 @@ static void hpet_ram_write(void *opaque, hwaddr addr, if (!timer_is_periodic(timer) || (timer->config & HPET_TN_SETVAL)) { timer->cmp = (timer->cmp & 0xULL) | new_val << 32; -} else { +} +if (timer_is_periodic(timer)) { /* * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +new_val = MIN(new_val, ~0u >> 1); +timer->cmp = (timer->cmp & 0xULL) | new_val << 32; This seems wrong to me. The comparator must be reset using HPET_TN_SETVAL, otherwise it just keeps running. From the specification: "If software wants to change the periodic rate, it should write a new value to the comparator value register. At the point when the timer’s comparator indicates a match, this new value will be added to derive the next matching point. So as to avoid race conditions where the new value is written just as a match occurs, either the main counter should be halted or the comparator disabled when the new periodic rate is written". The sentence "at the point when the timer’s comparator indicates a match" indicates that the comparator register is not written. I suspect you're hitting the other issue, and HPET_TN_SETVAL is cleared incorrectly by a 64-bit write. I'll try sending out a patch to fix that. -if ((>timer[i])->cmp != ~0ULL) { +if (hpet_enabled(s)) { hpet_set_timer(>timer[i]); }
Re: [PATCH v4 0/7] util: Introduce qemu_get_runtime_dir()
On Tue, Jul 16, 2024 at 11:56 AM Daniel P. Berrangé wrote: > > On Tue, Jul 16, 2024 at 11:06:57AM +0300, Michael Tokarev wrote: > > 16.07.2024 10:27, Akihiko Odaki wrote: > > > qemu_get_runtime_dir() returns a dynamically allocated directory path > > > that is appropriate for storing runtime files. It corresponds to "run" > > > directory in Unix. > > > > Since runtime dir is always used with a filename within, how about > > > > char *qemu_get_runtime_path(const char *filename) > > > > which return RUNTIME_DIR/filename instead of just RUNTIME_DIR ? > > Yeah, I agree, every single caller of the function goes on to call > g_build_filename with the result. The helper should just be building > the filename itself. That would mean using variable arguments and g_build_filename_valist(). Paolo
[PATCH 2/2] hpet: fix HPET_TN_SETVAL for high 32-bits of the comparator
Commit 3787324101b ("hpet: Fix emulation of HPET_TN_SETVAL (Jan Kiszka)", 2009-04-17) applied the fix only to the low 32-bits of the comparator, but it should be done for the high bits as well. Otherwise, the high 32-bits of the comparator cannot be written and they remain fixed to 0x. Co-developed-by: TaiseiIto Signed-off-by: TaiseiIto Signed-off-by: Paolo Bonzini --- hw/timer/hpet.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index 16be1278d09..85fb2c07ae3 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -554,6 +554,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr, timer->period = (timer->period & 0xULL) | new_val; } +/* + * FIXME: on a 64-bit write, HPET_TN_SETVAL should apply to the + * high bits part as well. + */ timer->config &= ~HPET_TN_SETVAL; if (hpet_enabled(s)) { hpet_set_timer(timer); @@ -564,7 +568,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr, if (!timer_is_periodic(timer) || (timer->config & HPET_TN_SETVAL)) { timer->cmp = (timer->cmp & 0xULL) | new_val << 32; -} else { +} +if (timer_is_periodic(timer)) { /* * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value @@ -572,12 +577,12 @@ static void hpet_ram_write(void *opaque, hwaddr addr, new_val = MIN(new_val, ~0u >> 1); timer->period = (timer->period & 0xULL) | new_val << 32; -} -timer->config &= ~HPET_TN_SETVAL; -if (hpet_enabled(s)) { -hpet_set_timer(timer); -} -break; +} +timer->config &= ~HPET_TN_SETVAL; +if (hpet_enabled(s)) { +hpet_set_timer(timer); +} +break; case HPET_TN_ROUTE: timer->fsb = (timer->fsb & 0xULL) | new_val; break; -- 2.45.2
[PATCH 0/2] first batch of hpet fixes
Extracted from the patch that TaiseiIto tested. While not sufficient to fix their problems, this is a step in the right direction. Paolo Bonzini (2): hpet: fix clamping of period hpet: fix HPET_TN_SETVAL for high 32-bits of the comparator hw/timer/hpet.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) -- 2.45.2
[PATCH 1/2] hpet: fix clamping of period
When writing a new period, the clamping should use a maximum value rather than a bit mask. Also, when writing the high bits new_val is shifted right by 32, so the maximum allowed period should also be shifted right. Signed-off-by: Paolo Bonzini --- hw/timer/hpet.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index 01efe4885db..16be1278d09 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -548,7 +548,9 @@ static void hpet_ram_write(void *opaque, hwaddr addr, * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +if (timer->config & HPET_TN_32_BIT) { +new_val = MIN(new_val, ~0u >> 1); +} timer->period = (timer->period & 0xULL) | new_val; } @@ -567,7 +569,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr, * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +new_val = MIN(new_val, ~0u >> 1); timer->period = (timer->period & 0xULL) | new_val << 32; } -- 2.45.2
Re: [PATCH] Manpage: Update description of 'user=username' for '-run-with'
> Manpage: the description of '-runs' didn't show this parameter will use > setuid, so the customer might get confused when 'elevateprivileges=deny' is > used. Since '-runas' is going to be deprecated and replaced by this > parameter in the coming qemu9.1, add the message here. Queued, thanks. I modified the patch a bit to explain how setgid and setgroups are used in addition to setuid: diff --git a/qemu-options.hx b/qemu-options.hx index ad6521ef5e7..694fa37f284 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -5024,8 +5024,11 @@ SRST in combination with -runas. ``user=username`` or ``user=uid:gid`` can be used to drop root privileges -by switching to the specified user (via username) or user and group -(via uid:gid) immediately before starting guest execution. +before starting guest execution. QEMU will use the ``setuid`` and ``setgid`` +system calls to switch to the specified identity. Note that the +``user=username`` syntax will also apply the full set of supplementary +groups for the user, whereas the ``user=uid:gid`` will use only the +``gid`` group. Paolo
Re: [PATCH] qemu/timer: Add host ticks function for LoongArch
Queued, thanks. Paolo
Re: [PATCH] scsi: fix regression and honor bootindex again for legacy drives
Queued, thanks. Paolo
Re: [PATCH] hw/scsi/lsi53c895a: bump instruction limit in scripts processing to fix regression
Queued, thanks. Paolo
Re: [PATCH] disas: Fix build against Capstone v6
Queued, thanks. Paolo
Re: [PATCH] meson: Use -fno-sanitize=function when available
Queued, thanks. Paolo
Re: [PATCH] cpu: Free queued CPU work
Queued, thanks. Paolo
Re: [PATCH v4 0/7] util: Introduce qemu_get_runtime_dir()
Queued, thanks. Paolo
[PULL 02/13] target/i386/tcg: Remove SEG_ADDL
From: Richard Henderson This truncation is now handled by MMU_*32_IDX. The introduction of MMU_*32_IDX in fact applied correct 32-bit wraparound to 16-bit accesses with a high segment base (e.g. big real mode or vm86 mode), which did not use SEG_ADDL. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-3-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index aee3d19f29b..19d6b41a589 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -579,10 +579,6 @@ int exception_has_error_code(int intno) } while (0) #endif -/* in 64-bit machines, this can overflow. So this segment addition macro - * can be used to trim the value to 32-bit whenever needed */ -#define SEG_ADDL(ssp, sp, sp_mask) ((uint32_t)((ssp) + (sp & (sp_mask - /* XXX: add a is_user flag to have proper security support */ #define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ {\ @@ -593,7 +589,7 @@ int exception_has_error_code(int intno) #define PUSHL_RA(ssp, sp, sp_mask, val, ra) \ { \ sp -= 4;\ -cpu_stl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), (uint32_t)(val), ra); \ +cpu_stl_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \ } #define POPW_RA(ssp, sp, sp_mask, val, ra) \ @@ -604,7 +600,7 @@ int exception_has_error_code(int intno) #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), ra); \ +val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } -- 2.45.2
[PULL 03/13] target/i386/tcg: Allow IRET from user mode to user mode with SMAP
This fixes a bug wherein i386/tcg assumed an interrupt return using the IRET instruction was always returning from kernel mode to either kernel mode or user mode. This assumption is violated when IRET is used as a clever way to restore thread state, as for example in the dotnet runtime. There, IRET returns from user mode to user mode. This bug is that stack accesses from IRET and RETF, as well as accesses to the parameters in a call gate, are normal data accesses using the current CPL. This manifested itself as a page fault in the guest Linux kernel due to SMAP preventing the access. This bug appears to have been in QEMU since the beginning. Analyzed-by: Robert R. Henry Co-developed-by: Robert R. Henry Signed-off-by: Robert R. Henry Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 19d6b41a589..224e73e9ed0 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -594,13 +594,13 @@ int exception_has_error_code(int intno) #define POPW_RA(ssp, sp, sp_mask, val, ra) \ {\ -val = cpu_lduw_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = cpu_lduw_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 2; \ } #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = (uint32_t)cpu_ldl_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } @@ -847,7 +847,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, #define POPQ_RA(sp, val, ra)\ { \ -val = cpu_ldq_kernel_ra(env, sp, ra); \ +val = cpu_ldq_data_ra(env, sp, ra); \ sp += 8;\ } @@ -1797,18 +1797,18 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, PUSHL_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHL_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_ldl_kernel_ra(env, old_ssp + -((env->regs[R_ESP] + i * 4) & - old_sp_mask), GETPC()); +val = cpu_ldl_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 4) & old_sp_mask), + GETPC()); PUSHL_RA(ssp, sp, sp_mask, val, GETPC()); } } else { PUSHW_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHW_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_lduw_kernel_ra(env, old_ssp + - ((env->regs[R_ESP] + i * 2) & - old_sp_mask), GETPC()); +val = cpu_lduw_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 2) & old_sp_mask), + GETPC()); PUSHW_RA(ssp, sp, sp_mask, val, GETPC()); } } -- 2.45.2
[PULL 10/13] target/i386/tcg: use X86Access for TSS access
This takes care of probing the vaddr range in advance, and is also faster because it avoids repeated TLB lookups. It also matches the Intel manual better, as it says "Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory"; note however that it's not clear how the processor checks for segment descriptors, and this check is not included in the AMD manual. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 110 ++- 1 file changed, 58 insertions(+), 52 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 0242f9d8b58..fea08a2ba0f 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -27,6 +27,7 @@ #include "exec/log.h" #include "helper-tcg.h" #include "seg_helper.h" +#include "access.h" #ifdef TARGET_X86_64 #define SET_ESP(val, sp_mask) \ @@ -313,14 +314,15 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t e1, uint32_t e2, int source, uint32_t next_eip, uintptr_t retaddr) { -int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i; +int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, i; target_ulong tss_base; uint32_t new_regs[8], new_segs[6]; uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; uint32_t old_eflags, eflags_mask; SegmentCache *dt; -int index; +int mmu_index, index; target_ulong ptr; +X86Access old, new; type = (e2 >> DESC_TYPE_SHIFT) & 0xf; LOG_PCALL("switch_tss: sel=0x%04x type=%d src=%d\n", tss_selector, type, @@ -374,35 +376,45 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); } +/* X86Access avoids memory exceptions during the task switch */ +mmu_index = cpu_mmu_index_kernel(env); +access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, + MMU_DATA_STORE, mmu_index, retaddr); + +if (source == SWITCH_TSS_CALL) { +/* Probe for future write of parent task */ +probe_access(env, tss_base, 2, MMU_DATA_STORE, + mmu_index, retaddr); +} +access_prepare_mmu(, env, tss_base, tss_limit, + MMU_DATA_LOAD, mmu_index, retaddr); + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -new_cr3 = cpu_ldl_kernel_ra(env, tss_base + 0x1c, retaddr); -new_eip = cpu_ldl_kernel_ra(env, tss_base + 0x20, retaddr); -new_eflags = cpu_ldl_kernel_ra(env, tss_base + 0x24, retaddr); +new_cr3 = access_ldl(, tss_base + 0x1c); +new_eip = access_ldl(, tss_base + 0x20); +new_eflags = access_ldl(, tss_base + 0x24); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_ldl_kernel_ra(env, tss_base + (0x28 + i * 4), -retaddr); +new_regs[i] = access_ldl(, tss_base + (0x28 + i * 4)); } for (i = 0; i < 6; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x48 + i * 4), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x48 + i * 4)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x60, retaddr); -new_trap = cpu_ldl_kernel_ra(env, tss_base + 0x64, retaddr); +new_ldt = access_ldw(, tss_base + 0x60); +new_trap = access_ldl(, tss_base + 0x64); } else { /* 16 bit */ new_cr3 = 0; -new_eip = cpu_lduw_kernel_ra(env, tss_base + 0x0e, retaddr); -new_eflags = cpu_lduw_kernel_ra(env, tss_base + 0x10, retaddr); +new_eip = access_ldw(, tss_base + 0x0e); +new_eflags = access_ldw(, tss_base + 0x10); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x12 + i * 2), retaddr); +new_regs[i] = access_ldw(, tss_base + (0x12 + i * 2)); } for (i = 0; i < 4; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x22 + i * 2), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x22 + i * 2)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x2a, retaddr); +new_ldt = access_ldw(, tss_base + 0x2a); new_segs[R_FS] = 0; new_segs[R_GS] = 0; new_trap = 0; @@ -412,16 +424,6 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, chapters 12.2.5 and 13.2.4 on how to implement TSS Trap bit */ (void)new_trap; -/* NOTE: we must avoid memory exceptions during the task switch,
[PULL 12/13] i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT
From: Michael Roth Currently if the 'legacy-vm-type' property of the sev-guest object is 'on', QEMU will attempt to use the newer KVM_SEV_INIT2 kernel interface in conjunction with the newer KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM KVM VM types. This can lead to measurement changes if, for instance, an SEV guest was created on a host that originally had an older kernel that didn't support KVM_SEV_INIT2, but is booted on the same host later on after the host kernel was upgraded. Instead, if legacy-vm-type is 'off', QEMU should fail if the KVM_SEV_INIT2 interface is not provided by the current host kernel. Modify the fallback handling accordingly. In the future, VMSA features and other flags might be added to QEMU which will require legacy-vm-type to be 'off' because they will rely on the newer KVM_SEV_INIT2 interface. It may be difficult to convey to users what values of legacy-vm-type are compatible with which features/options, so as part of this rework, switch legacy-vm-type to a tri-state OnOffAuto option. 'auto' in this case will automatically switch to using the newer KVM_SEV_INIT2, but only if it is required to make use of new VMSA features or other options only available via KVM_SEV_INIT2. Defining 'auto' in this way would avoid inadvertantly breaking compatibility with older kernels since it would only be used in cases where users opt into newer features that are only available via KVM_SEV_INIT2 and newer kernels, and provide better default behavior than the legacy-vm-type=off behavior that was previously in place, so make it the default for 9.1+ machine types. Cc: Daniel P. Berrangé Cc: Paolo Bonzini cc: k...@vger.kernel.org Signed-off-by: Michael Roth Reviewed-by: Daniel P. Berrangé Link: https://lore.kernel.org/r/20240710041005.83720-1-michael.r...@amd.com Signed-off-by: Paolo Bonzini --- qapi/qom.json | 18 ++ hw/i386/pc.c | 2 +- target/i386/sev.c | 87 +++ 3 files changed, 84 insertions(+), 23 deletions(-) diff --git a/qapi/qom.json b/qapi/qom.json index 8e75a419c30..7eccd2e14e2 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -924,12 +924,16 @@ # @handle: SEV firmware handle (default: 0) # # @legacy-vm-type: Use legacy KVM_SEV_INIT KVM interface for creating the VM. -# The newer KVM_SEV_INIT2 interface syncs additional vCPU -# state when initializing the VMSA structures, which will -# result in a different guest measurement. Set this to -# maintain compatibility with older QEMU or kernel versions -# that rely on legacy KVM_SEV_INIT behavior. -# (default: false) (since 9.1) +# The newer KVM_SEV_INIT2 interface, from Linux >= 6.10, syncs +# additional vCPU state when initializing the VMSA structures, +# which will result in a different guest measurement. Set +# this to 'on' to force compatibility with older QEMU or kernel +# versions that rely on legacy KVM_SEV_INIT behavior. 'auto' +# will behave identically to 'on', but will automatically +# switch to using KVM_SEV_INIT2 if the user specifies any +# additional options that require it. If set to 'off', QEMU +# will require KVM_SEV_INIT2 unconditionally. +# (default: off) (since 9.1) # # Since: 2.12 ## @@ -939,7 +943,7 @@ '*session-file': 'str', '*policy': 'uint32', '*handle': 'uint32', -'*legacy-vm-type': 'bool' } } +'*legacy-vm-type': 'OnOffAuto' } } ## # @SevSnpGuestProperties: diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 4fbc5774708..c74931d577a 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -83,7 +83,7 @@ GlobalProperty pc_compat_9_0[] = { { TYPE_X86_CPU, "x-amd-topoext-features-only", "false" }, { TYPE_X86_CPU, "x-l1-cache-per-thread", "false" }, { TYPE_X86_CPU, "guest-phys-bits", "0" }, -{ "sev-guest", "legacy-vm-type", "true" }, +{ "sev-guest", "legacy-vm-type", "on" }, { TYPE_X86_CPU, "legacy-multi-node", "on" }, }; const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0); diff --git a/target/i386/sev.c b/target/i386/sev.c index 2ba5f517228..a1157c0ede6 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -144,7 +144,7 @@ struct SevGuestState { uint32_t policy; char *dh_cert_file; char *session_file; -bool legacy_vm_type; +OnOffAuto legacy_vm_type; }; struct SevSnpGuestState { @@ -1369,6 +1369,17 @@ sev_vm_state_change(void *opaque, bool running, RunState state) } } +/* + * This helper is to examine sev-guest properties and determine if any options + * have been set which rely on the newer KVM_SEV_INIT2
[PULL 11/13] target/i386/tcg: save current task state before loading new one
This is how the steps are ordered in the manual. EFLAGS.NT is overwritten after the fact in the saved image. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 85 +++- 1 file changed, 45 insertions(+), 40 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index fea08a2ba0f..c0641a79c70 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -389,6 +389,42 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, access_prepare_mmu(, env, tss_base, tss_limit, MMU_DATA_LOAD, mmu_index, retaddr); +/* save the current state in the old TSS */ +old_eflags = cpu_compute_eflags(env); +if (old_type & 8) { +/* 32 bit */ +access_stl(, env->tr.base + 0x20, next_eip); +access_stl(, env->tr.base + 0x24, old_eflags); +access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); +access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); +access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); +access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); +access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); +access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); +access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); +access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); +for (i = 0; i < 6; i++) { +access_stw(, env->tr.base + (0x48 + i * 4), + env->segs[i].selector); +} +} else { +/* 16 bit */ +access_stw(, env->tr.base + 0x0e, next_eip); +access_stw(, env->tr.base + 0x10, old_eflags); +access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); +access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); +access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); +access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); +access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); +access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); +access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]); +access_stw(, env->tr.base + (0x12 + 7 * 2), env->regs[R_EDI]); +for (i = 0; i < 4; i++) { +access_stw(, env->tr.base + (0x22 + i * 2), + env->segs[i].selector); +} +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ @@ -428,49 +464,16 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, if (source == SWITCH_TSS_JMP || source == SWITCH_TSS_IRET) { tss_set_busy(env, env->tr.selector, 0, retaddr); } -old_eflags = cpu_compute_eflags(env); + if (source == SWITCH_TSS_IRET) { old_eflags &= ~NT_MASK; +if (old_type & 8) { +access_stl(, env->tr.base + 0x24, old_eflags); +} else { +access_stw(, env->tr.base + 0x10, old_eflags); + } } -/* save the current state in the old TSS */ -if (old_type & 8) { -/* 32 bit */ -access_stl(, env->tr.base + 0x20, next_eip); -access_stl(, env->tr.base + 0x24, old_eflags); -access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); -access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); -access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); -access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); -access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); -access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); -access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); -access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); -for (i = 0; i < 6; i++) { -access_stw(, env->tr.base + (0x48 + i * 4), - env->segs[i].selector); -} -} else { -/* 16 bit */ -access_stw(, env->tr.base + 0x0e, next_eip); -access_stw(, env->tr.base + 0x10, old_eflags); -access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); -access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); -access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); -access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); -access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); -access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); -access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]);
[PULL 00/13] target/i386 changes for 2024-07-12
The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9: Merge tag 'pull-target-arm-20240711' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-11 12:00:00 -0700) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream-i386 for you to fetch changes up to cdcadf9ee9efef96323e0b88fccff589f06fc0ee: i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT (2024-07-12 15:35:54 +0200) * target/i386/tcg: fixes for seg_helper.c * SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT, but also don't use it by default Michael Roth (1): i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT Paolo Bonzini (8): target/i386/tcg: fix POP to memory in long mode target/i386/tcg: Allow IRET from user mode to user mode with SMAP target/i386/tcg: use PUSHL/PUSHW for error code target/i386/tcg: Compute MMU index once target/i386/tcg: Use DPL-level accesses for interrupts and call gates target/i386/tcg: check for correct busy state before switching to a new task target/i386/tcg: use X86Access for TSS access target/i386/tcg: save current task state before loading new one Richard Henderson (3): target/i386/tcg: Remove SEG_ADDL target/i386/tcg: Reorg push/pop within seg_helper.c target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl qapi/qom.json| 18 +- target/i386/cpu.h| 11 +- hw/i386/pc.c | 2 +- target/i386/cpu.c| 27 +- target/i386/sev.c| 87 - target/i386/tcg/seg_helper.c | 665 +-- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 1 + 8 files changed, 472 insertions(+), 341 deletions(-) -- 2.45.2
[PULL 06/13] target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl
From: Richard Henderson Disconnect mmu index computation from the current pl as stored in env->hflags. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-2-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 11 ++- target/i386/cpu.c | 27 --- 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index c43ac01c794..1e121acef54 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2445,15 +2445,8 @@ static inline bool is_mmu_index_32(int mmu_index) return mmu_index & 1; } -static inline int cpu_mmu_index_kernel(CPUX86State *env) -{ -int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; -int mmu_index_base = -!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : -((env->hflags & HF_CPL_MASK) < 3 && (env->eflags & AC_MASK)) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; - -return mmu_index_base + mmu_index_32; -} +int x86_mmu_index_pl(CPUX86State *env, unsigned pl); +int cpu_mmu_index_kernel(CPUX86State *env); #define CC_DST (env->cc_dst) #define CC_SRC (env->cc_src) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index c05765eeafc..4688d140c2d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8122,18 +8122,39 @@ static bool x86_cpu_has_work(CPUState *cs) return x86_cpu_pending_interrupt(cs, cs->interrupt_request) != 0; } -static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +int x86_mmu_index_pl(CPUX86State *env, unsigned pl) { -CPUX86State *env = cpu_env(cs); int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 0 : 1; int mmu_index_base = -(env->hflags & HF_CPL_MASK) == 3 ? MMU_USER64_IDX : +pl == 3 ? MMU_USER64_IDX : !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : (env->eflags & AC_MASK) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; return mmu_index_base + mmu_index_32; } +static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +{ +CPUX86State *env = cpu_env(cs); +return x86_mmu_index_pl(env, env->hflags & HF_CPL_MASK); +} + +static int x86_mmu_index_kernel_pl(CPUX86State *env, unsigned pl) +{ +int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; +int mmu_index_base = +!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : +(pl < 3 && (env->eflags & AC_MASK) + ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX); + +return mmu_index_base + mmu_index_32; +} + +int cpu_mmu_index_kernel(CPUX86State *env) +{ +return x86_mmu_index_kernel_pl(env, env->hflags & HF_CPL_MASK); +} + static void x86_disas_set_info(CPUState *cs, disassemble_info *info) { X86CPU *cpu = X86_CPU(cs); -- 2.45.2
[PULL 05/13] target/i386/tcg: Reorg push/pop within seg_helper.c
From: Richard Henderson Interrupts and call gates should use accesses with the DPL as the privilege level. While computing the applicable MMU index is easy, the harder thing is how to plumb it in the code. One possibility could be to add a single argument to the PUSH* macros for the privilege level, but this is repetitive and risks confusion between the involved privilege levels. Another possibility is to pass both CPL and DPL, and adjusting both PUSH* and POP* to use specific privilege levels (instead of using cpu_{ld,st}*_data). This makes the code more symmetric. However, a more complicated but much nicer approach is to use a structure to contain the stack parameters, env, unwind return address, and rewrite the macros into functions. The struct provides an easy home for the MMU index as well. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-4-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 481 +++ 1 file changed, 259 insertions(+), 222 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index b985382d704..b6902ca3fba 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -28,6 +28,68 @@ #include "helper-tcg.h" #include "seg_helper.h" +#ifdef TARGET_X86_64 +#define SET_ESP(val, sp_mask) \ +do {\ +if ((sp_mask) == 0x) { \ +env->regs[R_ESP] = (env->regs[R_ESP] & ~0x) | \ +((val) & 0x); \ +} else if ((sp_mask) == 0xLL) { \ +env->regs[R_ESP] = (uint32_t)(val); \ +} else {\ +env->regs[R_ESP] = (val); \ +} \ +} while (0) +#else +#define SET_ESP(val, sp_mask) \ +do {\ +env->regs[R_ESP] = (env->regs[R_ESP] & ~(sp_mask)) |\ +((val) & (sp_mask));\ +} while (0) +#endif + +/* XXX: use mmu_index to have proper DPL support */ +typedef struct StackAccess +{ +CPUX86State *env; +uintptr_t ra; +target_ulong ss_base; +target_ulong sp; +target_ulong sp_mask; +} StackAccess; + +static void pushw(StackAccess *sa, uint16_t val) +{ +sa->sp -= 2; +cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} + +static void pushl(StackAccess *sa, uint32_t val) +{ +sa->sp -= 4; +cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} + +static uint16_t popw(StackAccess *sa) +{ +uint16_t ret = cpu_lduw_data_ra(sa->env, +sa->ss_base + (sa->sp & sa->sp_mask), +sa->ra); +sa->sp += 2; +return ret; +} + +static uint32_t popl(StackAccess *sa) +{ +uint32_t ret = cpu_ldl_data_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->ra); +sa->sp += 4; +return ret; +} + int get_pg_mode(CPUX86State *env) { int pg_mode = 0; @@ -559,68 +621,19 @@ int exception_has_error_code(int intno) return 0; } -#ifdef TARGET_X86_64 -#define SET_ESP(val, sp_mask) \ -do {\ -if ((sp_mask) == 0x) { \ -env->regs[R_ESP] = (env->regs[R_ESP] & ~0x) | \ -((val) & 0x); \ -} else if ((sp_mask) == 0xLL) { \ -env->regs[R_ESP] = (uint32_t)(val); \ -} else {\ -env->regs[R_ESP] = (val); \ -} \ -} while (0) -#else -#define SET_ESP(val, sp_mask) \ -do {\ -env->regs[R_ESP] = (env->regs[R_ESP] & ~(sp_mask)) |\ -((val) & (sp_mask));\ -} while (0) -#endif - -/* XXX: add a is_user flag to have proper security support */ -#define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ -{\ -sp -= 2
[PULL 07/13] target/i386/tcg: Compute MMU index once
Add the MMU index to the StackAccess struct, so that it can be cached or (in the next patch) computed from information that is not in CPUX86State. Co-developed-by: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index b6902ca3fba..8a6d92b3583 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -56,36 +56,37 @@ typedef struct StackAccess target_ulong ss_base; target_ulong sp; target_ulong sp_mask; +int mmu_index; } StackAccess; static void pushw(StackAccess *sa, uint16_t val) { sa->sp -= 2; -cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stw_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static void pushl(StackAccess *sa, uint32_t val) { sa->sp -= 4; -cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stl_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static uint16_t popw(StackAccess *sa) { -uint16_t ret = cpu_lduw_data_ra(sa->env, -sa->ss_base + (sa->sp & sa->sp_mask), -sa->ra); +uint16_t ret = cpu_lduw_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 2; return ret; } static uint32_t popl(StackAccess *sa) { -uint32_t ret = cpu_ldl_data_ra(sa->env, - sa->ss_base + (sa->sp & sa->sp_mask), - sa->ra); +uint32_t ret = cpu_ldl_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 4; return ret; } @@ -677,6 +678,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); if (type == 5) { /* task gate */ @@ -858,12 +860,12 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, static void pushq(StackAccess *sa, uint64_t val) { sa->sp -= 8; -cpu_stq_kernel_ra(sa->env, sa->sp, val, sa->ra); +cpu_stq_mmuidx_ra(sa->env, sa->sp, val, sa->mmu_index, sa->ra); } static uint64_t popq(StackAccess *sa) { -uint64_t ret = cpu_ldq_data_ra(sa->env, sa->sp, sa->ra); +uint64_t ret = cpu_ldq_mmuidx_ra(sa->env, sa->sp, sa->mmu_index, sa->ra); sa->sp += 8; return ret; } @@ -982,6 +984,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); sa.sp_mask = -1; sa.ss_base = 0; if (dpl < cpl || ist != 0) { @@ -1116,6 +1119,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int, sa.sp = env->regs[R_ESP]; sa.sp_mask = 0x; sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (is_int) { old_eip = next_eip; @@ -1579,6 +1583,7 @@ void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip, sa.sp = env->regs[R_ESP]; sa.sp_mask = get_sp_mask(env->segs[R_SS].flags); sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (shift) { pushl(, env->segs[R_CS].selector); @@ -1618,6 +1623,7 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, sa.env = env; sa.ra = GETPC(); +sa.mmu_index = cpu_mmu_index_kernel(env); if (e2 & DESC_S_MASK) { if (!(e2 & DESC_CS_MASK)) { @@ -1905,6 +1911,7 @@ void helper_iret_real(CPUX86State *env, int shift) sa.env = env; sa.ra = GETPC(); +sa.mmu_index = x86_mmu_index_pl(env, 0); sa.sp_mask = 0x; /* : use SS segment size? */ sa.sp = env->regs[R_ESP]; sa.ss_base = env->segs[R_SS].base; @@ -1976,8 +1983,11 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, target_ulong new_eip, new_esp; StackAccess sa; +cpl = env->hflags & HF_CPL_MASK; + sa.env = env; sa.ra = retaddr; +sa.mmu_index = x86_mmu_index_pl(env, cpl); #ifdef TARGET_X86_64 if (shift == 2) { @@ -2032,7 +2042,6 @@
[PULL 13/13] Revert "qemu-char: do not operate on sources from finalize callbacks"
From: Sergey Dyasli This reverts commit 2b316774f60291f57ca9ecb6a9f0712c532cae34. After 038b4217884c ("Revert "chardev: use a child source for qio input source"") we've been observing the "iwp->src == NULL" assertion triggering periodically during the initial capabilities querying by libvirtd. One of possible backtraces: Thread 1 (Thread 0x7f16cd4f0700 (LWP 43858)): 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x7f16c6c21e65 in __GI_abort () at abort.c:79 2 0x7f16c6c21d39 in __assert_fail_base at assert.c:92 3 0x7f16c6c46e86 in __GI___assert_fail (assertion=assertion@entry=0x562e9bcdaadd "iwp->src == NULL", file=file@entry=0x562e9bcdaac8 "../chardev/char-io.c", line=line@entry=99, function=function@entry=0x562e9bcdab10 <__PRETTY_FUNCTION__.20549> "io_watch_poll_finalize") at assert.c:101 4 0x562e9ba20c2c in io_watch_poll_finalize (source=) at ../chardev/char-io.c:99 5 io_watch_poll_finalize (source=) at ../chardev/char-io.c:88 6 0x7f16c904aae0 in g_source_unref_internal () from /lib64/libglib-2.0.so.0 7 0x7f16c904baf9 in g_source_destroy_internal () from /lib64/libglib-2.0.so.0 8 0x562e9ba20db0 in io_remove_watch_poll (source=0x562e9d6720b0) at ../chardev/char-io.c:147 9 remove_fd_in_watch (chr=chr@entry=0x562e9d5f3800) at ../chardev/char-io.c:153 10 0x562e9ba23ffb in update_ioc_handlers (s=0x562e9d5f3800) at ../chardev/char-socket.c:592 11 0x562e9ba2072f in qemu_chr_fe_set_handlers_full at ../chardev/char-fe.c:279 12 0x562e9ba207a9 in qemu_chr_fe_set_handlers at ../chardev/char-fe.c:304 13 0x562e9ba2ca75 in monitor_qmp_setup_handlers_bh (opaque=0x562e9d4c2c60) at ../monitor/qmp.c:509 14 0x562e9bb6222e in aio_bh_poll (ctx=ctx@entry=0x562e9d4c2f20) at ../util/async.c:216 15 0x562e9bb4de0a in aio_poll (ctx=0x562e9d4c2f20, blocking=blocking@entry=true) at ../util/aio-posix.c:722 16 0x562e9b99dfaa in iothread_run (opaque=0x562e9d4c26f0) at ../iothread.c:63 17 0x562e9bb505a4 in qemu_thread_start (args=0x562e9d4c7ea0) at ../util/qemu-thread-posix.c:543 18 0x7f16c70081ca in start_thread (arg=) at pthread_create.c:479 19 0x7f16c6c398d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 io_remove_watch_poll(), which makes sure that iwp->src is NULL, calls g_source_destroy() which finds that iwp->src is not NULL in the finalize callback. This can only happen if another thread has managed to trigger io_watch_poll_prepare() callback in the meantime. Move iwp->src destruction back to the finalize callback to prevent the described race, and also remove the stale comment. The deadlock glib bug was fixed back in 2010 by b35820285668 ("gmain: move finalization of GSource outside of context lock"). Suggested-by: Paolo Bonzini Signed-off-by: Sergey Dyasli Link: https://lore.kernel.org/r/20240712092659.216206-1-sergey.dya...@nutanix.com Signed-off-by: Paolo Bonzini --- chardev/char-io.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/chardev/char-io.c b/chardev/char-io.c index dab77b112e3..3be17b51ca5 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -87,16 +87,12 @@ static gboolean io_watch_poll_dispatch(GSource *source, GSourceFunc callback, static void io_watch_poll_finalize(GSource *source) { -/* - * Due to a glib bug, removing the last reference to a source - * inside a finalize callback causes recursive locking (and a - * deadlock). This is not a problem inside other callbacks, - * including dispatch callbacks, so we call io_remove_watch_poll - * to remove this source. At this point, iwp->src must - * be NULL, or we would leak it. - */ IOWatchPoll *iwp = io_watch_poll_from_source(source); -assert(iwp->src == NULL); +if (iwp->src) { +g_source_destroy(iwp->src); +g_source_unref(iwp->src); +iwp->src = NULL; +} } static GSourceFuncs io_watch_poll_funcs = { @@ -139,11 +135,6 @@ static void io_remove_watch_poll(GSource *source) IOWatchPoll *iwp; iwp = io_watch_poll_from_source(source); -if (iwp->src) { -g_source_destroy(iwp->src); -g_source_unref(iwp->src); -iwp->src = NULL; -} g_source_destroy(>parent); } -- 2.45.2
[PULL 08/13] target/i386/tcg: Use DPL-level accesses for interrupts and call gates
This fixes a bug wherein i386/tcg assumed an interrupt return using the CALL or JMP instructions were always going from kernel or user mode to kernel mode, when using a call gate. This assumption is violated if the call gate has a DPL that is greater than 0. In addition, the stack accesses should count as explicit, not implicit ("kernel" in QEMU code), so that SMAP is not applied if DPL=3. Analyzed-by: Robert R. Henry Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249 Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 8a6d92b3583..809ee3d9833 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -678,7 +678,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); if (type == 5) { /* task gate */ @@ -984,7 +984,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); sa.sp_mask = -1; sa.ss_base = 0; if (dpl < cpl || ist != 0) { @@ -1119,7 +1119,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int, sa.sp = env->regs[R_ESP]; sa.sp_mask = 0x; sa.ss_base = env->segs[R_SS].base; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, 0); if (is_int) { old_eip = next_eip; @@ -1583,7 +1583,7 @@ void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip, sa.sp = env->regs[R_ESP]; sa.sp_mask = get_sp_mask(env->segs[R_SS].flags); sa.ss_base = env->segs[R_SS].base; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, 0); if (shift) { pushl(, env->segs[R_CS].selector); @@ -1619,17 +1619,17 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC()); } cpl = env->hflags & HF_CPL_MASK; +dpl = (e2 >> DESC_DPL_SHIFT) & 3; LOG_PCALL("desc=%08x:%08x\n", e1, e2); sa.env = env; sa.ra = GETPC(); -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); if (e2 & DESC_S_MASK) { if (!(e2 & DESC_CS_MASK)) { raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC()); } -dpl = (e2 >> DESC_DPL_SHIFT) & 3; if (e2 & DESC_C_MASK) { /* conforming code segment */ if (dpl > cpl) { @@ -1691,7 +1691,6 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, } else { /* check gate type */ type = (e2 >> DESC_TYPE_SHIFT) & 0x1f; -dpl = (e2 >> DESC_DPL_SHIFT) & 3; rpl = new_cs & 3; #ifdef TARGET_X86_64 -- 2.45.2
[PULL 09/13] target/i386/tcg: check for correct busy state before switching to a new task
This step is listed in the Intel manual: "Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return)". The AMD manual lists the same operation under the "Preventing recursion" paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor checks the busy bit in the IRET case. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 809ee3d9833..0242f9d8b58 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -369,6 +369,11 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, old_tss_limit_max = 43; } +/* new TSS must be busy iff the source is an IRET instruction */ +if (!!(e2 & DESC_TSS_BUSY_MASK) != (source == SWITCH_TSS_IRET)) { +raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -- 2.45.2
[PULL 01/13] target/i386/tcg: fix POP to memory in long mode
In long mode, POP to memory will write a full 64-bit value. However, the call to gen_writeback() in gen_POP will use MO_32 because the decoding table is incorrect. The bug was latent until commit aea49fbb01a ("target/i386: use gen_writeback() within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v now receives op->ot instead of the "ot" returned by gen_pop_T0. Analyzed-by: Clément Chigot Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07) Tested-by: Clément Chigot Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 0d846c32c22..d2da1d396d5 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -1717,7 +1717,7 @@ static const X86OpEntry opcodes_root[256] = { [0x8C] = X86_OP_ENTRYwr(MOV, E,v, S,w, op0_Mw), [0x8D] = X86_OP_ENTRYwr(LEA, G,v, M,v, nolea), [0x8E] = X86_OP_ENTRYwr(MOV, S,w, E,w), -[0x8F] = X86_OP_GROUPw(group1A, E,v), +[0x8F] = X86_OP_GROUPw(group1A, E,d64), [0x98] = X86_OP_ENTRY1(CBW,0,v), /* rAX */ [0x99] = X86_OP_ENTRYwr(CWD, 2,v, 0,v), /* rDX, rAX */ diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index fc7477833bc..016dce81464 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -2788,6 +2788,7 @@ static void gen_POP(DisasContext *s, X86DecodedInsn *decode) X86DecodedOp *op = >op[0]; MemOp ot = gen_pop_T0(s); +assert(ot >= op->ot); if (op->has_ea || op->unit == X86_OP_SEG) { /* NOTE: order is important for MMU exceptions */ gen_writeback(s, decode, 0, s->T0); -- 2.45.2
[PULL 04/13] target/i386/tcg: use PUSHL/PUSHW for error code
Do not pre-decrement esp, let the macros subtract the appropriate operand size. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 224e73e9ed0..b985382d704 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -670,22 +670,20 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, } shift = switch_tss(env, intno * 8, e1, e2, SWITCH_TSS_CALL, old_eip); if (has_error_code) { -uint32_t mask; - /* push the error code */ if (env->segs[R_SS].flags & DESC_B_MASK) { -mask = 0x; +sp_mask = 0x; } else { -mask = 0x; +sp_mask = 0x; } -esp = (env->regs[R_ESP] - (2 << shift)) & mask; -ssp = env->segs[R_SS].base + esp; +esp = env->regs[R_ESP]; +ssp = env->segs[R_SS].base; if (shift) { -cpu_stl_kernel(env, ssp, error_code); +PUSHL(ssp, esp, sp_mask, error_code); } else { -cpu_stw_kernel(env, ssp, error_code); +PUSHW(ssp, esp, sp_mask, error_code); } -SET_ESP(esp, mask); +SET_ESP(esp, sp_mask); } return; } -- 2.45.2
Re: [PATCH] chardev: add a mutex to protect IOWatchPoll::src
On 7/11/24 11:51, Sergey Dyasli wrote: After 038b4217884c ("Revert "chardev: use a child source for qio input source"") we've been observing the "iwp->src == NULL" assertion triggering periodically during the initial capabilities querying by libvirtd. One of possible backtraces: Hi Sergey, thanks for the analysis! I noticed however that this comment is really old; it was added from commit 2b316774f60 ("qemu-char: do not operate on sources from finalize callbacks", 2013-04-22): /* Due to a glib bug, removing the last reference to a source * inside a finalize callback causes recursive locking (and a * deadlock). This is not a problem inside other callbacks, * including dispatch callbacks, so we call io_remove_watch_poll * to remove this source. At this point, iwp->src must * be NULL, or we would leak it. * * This would be solved much more elegantly by child sources, * but we support older glib versions that do not have them. */ and the original mailing list message points to a problem on RHEL6 and Wheezy, which were both relatively old in 2013. And in fact the issue with finalize had been fixed in glib in 2010: commit b358202856682e5cdefb0b4b8aaed3a45d9a85fa Author: Dan Winship Date: Sat Nov 6 09:35:25 2010 -0400 gmain: move finalization of GSource outside of context lock This avoids ugly deadlock situations such as in https://bugzilla.gnome.org/show_bug.cgi?id=586432 https://bugzilla.gnome.org/show_bug.cgi?id=626702 https://bugzilla.gnome.org/show_bug.cgi?id=634239 diff --git a/glib/gmain.c b/glib/gmain.c index b182c6607..301adb0a7 100644 --- a/glib/gmain.c +++ b/glib/gmain.c @@ -1520,7 +1520,13 @@ g_source_unref_internal (GSource *source, g_source_list_remove (source, context); if (source->source_funcs->finalize) - source->source_funcs->finalize (source); + { + if (context) + UNLOCK_CONTEXT (context); + source->source_funcs->finalize (source); + if (context) + LOCK_CONTEXT (context); + } g_free (source->name); source->name = NULL; So I think we should just revert commit 2b316774f60, which is not hard to do (if it works) even if the code has since moved from qemu-char.c to chardev/char-io.c. Thanks, Paolo Thread 1 (Thread 0x7f16cd4f0700 (LWP 43858)): 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x7f16c6c21e65 in __GI_abort () at abort.c:79 2 0x7f16c6c21d39 in __assert_fail_base at assert.c:92 3 0x7f16c6c46e86 in __GI___assert_fail (assertion=assertion@entry=0x562e9bcdaadd "iwp->src == NULL", file=file@entry=0x562e9bcdaac8 "../chardev/char-io.c", line=line@entry=99, function=function@entry=0x562e9bcdab10 <__PRETTY_FUNCTION__.20549> "io_watch_poll_finalize") at assert.c:101 4 0x562e9ba20c2c in io_watch_poll_finalize (source=) at ../chardev/char-io.c:99 5 io_watch_poll_finalize (source=) at ../chardev/char-io.c:88 6 0x7f16c904aae0 in g_source_unref_internal () from /lib64/libglib-2.0.so.0 7 0x7f16c904baf9 in g_source_destroy_internal () from /lib64/libglib-2.0.so.0 8 0x562e9ba20db0 in io_remove_watch_poll (source=0x562e9d6720b0) at ../chardev/char-io.c:147 9 remove_fd_in_watch (chr=chr@entry=0x562e9d5f3800) at ../chardev/char-io.c:153 10 0x562e9ba23ffb in update_ioc_handlers (s=0x562e9d5f3800) at ../chardev/char-socket.c:592 11 0x562e9ba2072f in qemu_chr_fe_set_handlers_full at ../chardev/char-fe.c:279 12 0x562e9ba207a9 in qemu_chr_fe_set_handlers at ../chardev/char-fe.c:304 13 0x562e9ba2ca75 in monitor_qmp_setup_handlers_bh (opaque=0x562e9d4c2c60) at ../monitor/qmp.c:509 14 0x562e9bb6222e in aio_bh_poll (ctx=ctx@entry=0x562e9d4c2f20) at ../util/async.c:216 15 0x562e9bb4de0a in aio_poll (ctx=0x562e9d4c2f20, blocking=blocking@entry=true) at ../util/aio-posix.c:722 16 0x562e9b99dfaa in iothread_run (opaque=0x562e9d4c26f0) at ../iothread.c:63 17 0x562e9bb505a4 in qemu_thread_start (args=0x562e9d4c7ea0) at ../util/qemu-thread-posix.c:543 18 0x7f16c70081ca in start_thread (arg=) at pthread_create.c:479 19 0x7f16c6c398d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 io_remove_watch_poll(), which makes sure that iwp->src is NULL, calls g_source_destroy() which finds that iwp->src is not NULL in the finalize callback. This can only happen if another thread has managed to trigger io_watch_poll_prepare() callback in the meantime. Introduce a mutex and a boolean variable to prevent other threads creating a watch in io_watch_poll_prepare() in case that the IOWatchPoll itself is about to get destroyed. Signed-off-by: Sergey Dyasli --- chardev/char-io.c | 21 + 1 file changed, 21 insertions(+) diff --git a/chardev/char-io.c b/chardev/char-io.c index dab77b112e35..b1edccf0cc85 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -34,6 +34,9 @@
Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access
On 7/10/24 20:40, Paolo Bonzini wrote: Il mer 10 lug 2024, 18:47 Richard Henderson mailto:richard.hender...@linaro.org>> ha scritto: On 7/9/24 23:29, Paolo Bonzini wrote: > This takes care of probing the vaddr range in advance, and is also faster > because it avoids repeated TLB lookups. It also matches the Intel manual > better, as it says "Checks that the current (old) TSS, new TSS, and all > segment descriptors used in the task switch are paged into system memory"; > note however that it's not clear how the processor checks for segment > descriptors, and this check is not included in the AMD manual. > > Signed-off-by: Paolo Bonzini mailto:pbonz...@redhat.com>> > --- > target/i386/tcg/seg_helper.c | 101 ++- > 1 file changed, 51 insertions(+), 50 deletions(-) > > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c > index 25af9d4a4ec..77f2c65c3cf 100644 > --- a/target/i386/tcg/seg_helper.c > +++ b/target/i386/tcg/seg_helper.c > @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, > raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); > } > > + /* X86Access avoids memory exceptions during the task switch */ > + access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, > + MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); > + > + if (source == SWITCH_TSS_CALL) { > + /* Probe for future write of parent task */ > + probe_access(env, tss_base, 2, MMU_DATA_STORE, > + cpu_mmu_index_kernel(env), retaddr); > + } > + access_prepare_mmu(, env, tss_base, tss_limit, > + MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); You're computing cpu_mmu_index_kernel 3 times. Squashing this in (easier to review than the whole thing): diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 4123ff1245e..4edfd26135f 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -321,7 +321,7 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; uint32_t old_eflags, eflags_mask; SegmentCache *dt; -int index; +int mmu_index, index; target_ulong ptr; X86Access old, new; @@ -378,16 +378,17 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, } /* X86Access avoids memory exceptions during the task switch */ +mmu_index = cpu_mmu_index_kernel(env); access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, - MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); + MMU_DATA_STORE, mmu_index, retaddr); if (source == SWITCH_TSS_CALL) { /* Probe for future write of parent task */ probe_access(env, tss_base, 2, MMU_DATA_STORE, -cpu_mmu_index_kernel(env), retaddr); +mmu_index, retaddr); } access_prepare_mmu(, env, tss_base, tss_limit, - MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); + MMU_DATA_LOAD, mmu_index, retaddr); /* read all the registers from the new TSS */ if (type & 8) { @@ -468,7 +469,11 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, context */ if (source == SWITCH_TSS_CALL) { -cpu_stw_kernel_ra(env, tss_base, env->tr.selector, retaddr); +/* + * Thanks to the probe_access above, we know the first two + * bytes addressed by are writable too. + */ +access_stw(, tss_base, env->tr.selector); new_eflags |= NT_MASK; } Paolo
Re: Disassembler location
On 7/10/24 20:02, Michael Morrell wrote: I'm working on a port to a new architecture and was noticing a discrepancy in where the disassembler code lives. There is a file "target//disas.c" for 4 architectures (avr, loongarch, openrisc, and rx), but a file "disas/.c" for 14 architectures (if I counted right). It seems the 4 architectures using "target//disas.c" are more recently added so I was wondering if that is now the preferred location. I couldn't find information on this, but I wasn't sure where to look. loongarch puts it in target// because it reuses some code between disassembler and translator. The others are not hosts, only targets. By putting the file in target//, they do not need to add it to the "disassemblers" variable in meson.build---but they add it anyway. :) All in all, if you're not in the loongarch situation I'd put it in disas/. Paolo
Re: [PATCH 00/10] target/i386/tcg: fixes for seg_helper.c
Il mer 10 lug 2024, 23:01 Robert Henry ha scritto: > I have only skimmed the diffs. Your knowledge of the deep semantics, > gained by close differential reading of intel and amd docs, is truly > amazing. Many thanks for pushing this through! > Thanks for bringing this to our attention too, apart from the practical bug hopefully it will help future readers to have a more precise implementation. I tried to acknowledge your contribution in the commit messages. I have 2 nits, perhaps stylistic only. > > For code like "sp -= 2" or "sp += 2" followed or preceded by a write to > the stack pointer of a uint16_t variable 'x', would it be better/more > robust to rewrite as: "sp -= sizeof(x)" ? > I think that's intentional because the value subtracted is related to the "stw" or "stl" in the store (likewise for incrementing after a load) more than to the size of x. There are a lot of masks constructed using -1. I think it would be clearer > to use 0x (for 32-bit masks) as that reminds the reader that this > is a bit mask. But it seems that using -1 is how the original code was > written. > -1 is used for 64-bit masks only. They get unwieldy quickly. :) Paolo > On Tue, Jul 9, 2024 at 11:29 PM Paolo Bonzini wrote: > >> This includes bugfixes: >> - allowing IRET from user mode to user mode with SMAP (do not use implicit >> kernel accesses, which break if the stack is in userspace) >> >> - use DPL-level accesses for interrupts and call gates >> >> - various fixes for task switching >> >> And two related cleanups: computing MMU index once for far calls and >> returns >> (including task switches), and using X86Access for TSS access. >> >> Tested with a really ugly patch to kvm-unit-tests, included after >> signature. >> >> Paolo Bonzini (7): >> target/i386/tcg: Allow IRET from user mode to user mode with SMAP >> target/i386/tcg: use PUSHL/PUSHW for error code >> target/i386/tcg: Compute MMU index once >> target/i386/tcg: Use DPL-level accesses for interrupts and call gates >> target/i386/tcg: check for correct busy state before switching to a >> new task >> target/i386/tcg: use X86Access for TSS access >> target/i386/tcg: save current task state before loading new one >> >> Richard Henderson (3): >> target/i386/tcg: Remove SEG_ADDL >> target/i386/tcg: Reorg push/pop within seg_helper.c >> target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl >> >> target/i386/cpu.h| 11 +- >> target/i386/cpu.c| 27 +- >> target/i386/tcg/seg_helper.c | 606 +++ >> 3 files changed, 354 insertions(+), 290 deletions(-) >> >> -- >> 2.45.2 >> >> diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c >> index c3ec0ad7..0bf40c6d 100644 >> --- a/lib/x86/usermode.c >> +++ b/lib/x86/usermode.c >> @@ -5,13 +5,15 @@ >> #include "x86/desc.h" >> #include "x86/isr.h" >> #include "alloc.h" >> +#include "alloc_page.h" >> #include "setjmp.h" >> #include "usermode.h" >> >> #include "libcflat.h" >> #include >> >> -#define USERMODE_STACK_SIZE0x2000 >> +#define USERMODE_STACK_ORDER 1 /* 8k */ >> +#define USERMODE_STACK_SIZE(1 << (12 + USERMODE_STACK_ORDER)) >> #define RET_TO_KERNEL_IRQ 0x20 >> >> static jmp_buf jmpbuf; >> @@ -37,9 +39,14 @@ uint64_t run_in_user(usermode_func func, unsigned int >> fault_vector, >> { >> extern char ret_to_kernel; >> volatile uint64_t rax = 0; >> - static unsigned char user_stack[USERMODE_STACK_SIZE]; >> + static unsigned char *user_stack; >> handler old_ex; >> >> + if (!user_stack) { >> + user_stack = alloc_pages(USERMODE_STACK_ORDER); >> + printf("%p\n", user_stack); >> + } >> + >> *raised_vector = 0; >> set_idt_entry(RET_TO_KERNEL_IRQ, _to_kernel, 3); >> old_ex = handle_exception(fault_vector, >> @@ -51,6 +58,8 @@ uint64_t run_in_user(usermode_func func, unsigned int >> fault_vector, >> return 0; >> } >> >> + memcpy(user_stack + USERMODE_STACK_SIZE - 8, , 8); >> + >> asm volatile ( >> /* Prepare kernel SP for exception handlers */ >> "mov %%rsp, %[rsp0]\n\t" >> @@ -63,12 +72,13 @@ uint64_t run_in_user(usermode_fu
Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access
Il mer 10 lug 2024, 18:47 Richard Henderson ha scritto: > On 7/9/24 23:29, Paolo Bonzini wrote: > > This takes care of probing the vaddr range in advance, and is also faster > > because it avoids repeated TLB lookups. It also matches the Intel manual > > better, as it says "Checks that the current (old) TSS, new TSS, and all > > segment descriptors used in the task switch are paged into system > memory"; > > note however that it's not clear how the processor checks for segment > > descriptors, and this check is not included in the AMD manual. > > > > Signed-off-by: Paolo Bonzini > > --- > > target/i386/tcg/seg_helper.c | 101 ++- > > 1 file changed, 51 insertions(+), 50 deletions(-) > > > > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c > > index 25af9d4a4ec..77f2c65c3cf 100644 > > --- a/target/i386/tcg/seg_helper.c > > +++ b/target/i386/tcg/seg_helper.c > > @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int > tss_selector, > > raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, > retaddr); > > } > > > > +/* X86Access avoids memory exceptions during the task switch */ > > +access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, > > +MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); > > + > > +if (source == SWITCH_TSS_CALL) { > > +/* Probe for future write of parent task */ > > +probe_access(env, tss_base, 2, MMU_DATA_STORE, > > + cpu_mmu_index_kernel(env), retaddr); > > +} > > +access_prepare_mmu(, env, tss_base, tss_limit, > > +MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); > > You're computing cpu_mmu_index_kernel 3 times. > Oh, indeed. Better than 30. :) This appears to be conservative in that you're requiring only 2 bytes (a > minimum) of 0x68 > to be writable. Is it legal to place the TSS at offset 0xffe of page 0, > with the balance > on page 1, with page 0 writable and page 1 read-only? Yes, paging is totally optional here. The only field that is written is the link. Otherwise I would think you could > just check the entire TSS for writability. > > Anyway, after the MMU_DATA_STORE probe, you have proved that 'X86Access > new' contains an > address range that may be stored. So you can change the SWITCH_TSS_CALL > store below to > access_stw() too. > Nice. > -/* NOTE: we must avoid memory exceptions during the task switch, > > - so we make dummy accesses before */ > > -/* XXX: it can still fail in some cases, so a bigger hack is > > - necessary to valid the TLB after having done the accesses */ > > - > > -v1 = cpu_ldub_kernel_ra(env, env->tr.base, retaddr); > > -v2 = cpu_ldub_kernel_ra(env, env->tr.base + old_tss_limit_max, > retaddr); > > -cpu_stb_kernel_ra(env, env->tr.base, v1, retaddr); > > -cpu_stb_kernel_ra(env, env->tr.base + old_tss_limit_max, v2, > retaddr); > > OMG. > Haha, yeah X86Access is perfect here. Paolo Looks like a fantastic cleanup overall. > > > r~ > >
Re: [RFC PATCH v4 2/7] rust: add bindgen step as a meson dependency
On Wed, Jul 10, 2024 at 4:48 PM Zhao Liu wrote: > > On Tue, Jul 09, 2024 at 02:28:38PM +0200, Paolo Bonzini wrote: > > > > Here are the stopping points that I found over the last couple weeks: > > > > 1.56.0: 2021 edition > > 1.59.0: const CStr::from_bytes_with_nul_unchecked (needed by cstr > > crate, see below) > > 1.64.0: std::ffi::c_char > > 1.65.0: Generic Associated Types > > 1.74.0: Clippy can be configured in Cargo.toml > > 1.77.0: C string literals, offset_of! > > > > I think 1.59.0 is pretty much the lower bound. Not having offset_of! > > will be a bit painful, but it can be worked around (and pretty much > > has to be, because 1.77.0 is really new). > > > > An additional question: does our minimum rust version requirement > indicate that users with this rust version can compile other > dependencies that satisfy QEMU requirements, such as bindgen? Yes (though in the case of bindgen, like cargo and rustc, we'll use it from the distro; Debian bookworm has 0.60.1 so that's what we'll have to stick with). Paolo
[PATCH] target/i386/tcg: fix POP to memory in long mode
In long mode, POP to memory will write a full 64-bit value. However, the call to gen_writeback() in gen_POP will use MO_32 because the decoding table is incorrect. The bug was latent until commit aea49fbb01a ("target/i386: use gen_writeback() within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v now receives op->ot instead of the "ot" returned by gen_pop_T0. Analyzed-by: Clément Chigot Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07) Tested-by: Clément Chigot Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 0d846c32c22..d2da1d396d5 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -1717,7 +1717,7 @@ static const X86OpEntry opcodes_root[256] = { [0x8C] = X86_OP_ENTRYwr(MOV, E,v, S,w, op0_Mw), [0x8D] = X86_OP_ENTRYwr(LEA, G,v, M,v, nolea), [0x8E] = X86_OP_ENTRYwr(MOV, S,w, E,w), -[0x8F] = X86_OP_GROUPw(group1A, E,v), +[0x8F] = X86_OP_GROUPw(group1A, E,d64), [0x98] = X86_OP_ENTRY1(CBW,0,v), /* rAX */ [0x99] = X86_OP_ENTRYwr(CWD, 2,v, 0,v), /* rDX, rAX */ diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index fc7477833bc..c6c2c7257b9 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -2788,6 +2788,8 @@ static void gen_POP(DisasContext *s, X86DecodedInsn *decode) X86DecodedOp *op = >op[0]; MemOp ot = gen_pop_T0(s); +/* Only 16/32-bit access in 32-bit mode, 16/64-bit access in long mode. */ +assert(ot == op->ot); if (op->has_ea || op->unit == X86_OP_SEG) { /* NOTE: order is important for MMU exceptions */ gen_writeback(s, decode, 0, s->T0); -- 2.45.2
Re: [PULL 13/42] target/i386: use gen_writeback() within gen_POP()
On 7/10/24 11:42, Clément Chigot wrote: Hi Mark, This patch introduces regressions in our x86_64 VxWorks kernels running over qemu. Some page faults are triggered randomly. Earlier to this patch, the MemOp `ot` passed to `gen_op_st_v` was the `gen_pop_T0` created a few lines above. Now, this is `op->ot` which comes from elsewhere. Adding `op->ot = ot` just before calling `gen_writeback` fixes my regressions. But I'm wondering if there could be some unexpected fallbacks, `op->ot` possibly being used afterwards. Mark's patch is correct and the (previously latent) mistake is in the decoding table. The manual correctly says that POP has "default 64-bit" operand; that is, it is 64-bit even without a REX.W prefix. It must be marked as "d64" rather than "v". Can you test this patch? diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc index 0d846c32c22..d2da1d396d5 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -1717,7 +1717,7 @@ static const X86OpEntry opcodes_root[256] = { [0x8C] = X86_OP_ENTRYwr(MOV, E,v, S,w, op0_Mw), [0x8D] = X86_OP_ENTRYwr(LEA, G,v, M,v, nolea), [0x8E] = X86_OP_ENTRYwr(MOV, S,w, E,w), -[0x8F] = X86_OP_GROUPw(group1A, E,v), +[0x8F] = X86_OP_GROUPw(group1A, E,d64), [0x98] = X86_OP_ENTRY1(CBW,0,v), /* rAX */ [0x99] = X86_OP_ENTRYwr(CWD, 2,v, 0,v), /* rDX, rAX */ diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index fc7477833bc..36bb308e449 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -2788,6 +2788,8 @@ static void gen_POP(DisasContext *s, X86DecodedInsn *decode) X86DecodedOp *op = >op[0]; MemOp ot = gen_pop_T0(s); +/* Only 16/32-bit access in 32-bit mode, 16/64-bit access in long mode. */ +assert(ot == op->ot); if (op->has_ea || op->unit == X86_OP_SEG) { /* NOTE: order is important for MMU exceptions */ gen_writeback(s, decode, 0, s->T0); Thanks (and it's reassuring that everything else has worked fine for you!), Paolo Thanks, Clément On Sat, Jun 8, 2024 at 10:36 AM Paolo Bonzini wrote: From: Mark Cave-Ayland Instead of directly implementing the writeback using gen_op_st_v(), use the existing gen_writeback() function. Suggested-by: Paolo Bonzini Signed-off-by: Mark Cave-Ayland Message-ID: <20240606095319.229650-3-mark.cave-ayl...@ilande.co.uk> Signed-off-by: Paolo Bonzini --- target/i386/tcg/emit.c.inc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index ca78504b6e4..6123235c000 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -2580,9 +2580,9 @@ static void gen_POP(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) if (op->has_ea) { /* NOTE: order is important for MMU exceptions */ -gen_op_st_v(s, ot, s->T0, s->A0); -op->unit = X86_OP_SKIP; +gen_writeback(s, decode, 0, s->T0); } + /* NOTE: writing back registers after update is important for pop %sp */ gen_pop_update(s, ot); } -- 2.45.1
Re: [PATCH] hw/timer/hpet: Fix wrong HPET interrupts
Hello! Thanks for looking after the HPET, which is not a very well maintained device. I am not sure your patch needs to mask the comparator with timer->cmp &= 0xULL; I think that's a bug in the "case HPET_TN_CMP + 4" part of hpet_ram_write. The logic was changed in "hpet: Fix emulation of HPET_TN_SETVAL" but the commit forgot to apply it to the high bits of the comparator. The whole handling of writes to the comparator is very messy. I can see two more bugs: * Idon't think it's correct that HPET_TN_CMP+4 does not clear HPET_TN_SETVAL * the clamping should take into account that new_val is shifted by 32 Can you check that this also fixes the problem: diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c index 01efe4885db..11df272fe87 100644 --- a/hw/timer/hpet.c +++ b/hw/timer/hpet.c @@ -552,6 +552,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr, timer->period = (timer->period & 0xULL) | new_val; } +/* + * FIXME: on a 64-bit write, HPET_TN_SETVAL should apply to the + * high bits part as well. + */ timer->config &= ~HPET_TN_SETVAL; if (hpet_enabled(s)) { hpet_set_timer(timer); @@ -562,7 +566,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr, if (!timer_is_periodic(timer) || (timer->config & HPET_TN_SETVAL)) { timer->cmp = (timer->cmp & 0xULL) | new_val << 32; -} else { +} +if (timer_is_periodic(timer)) { /* * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value @@ -562,20 +566,21 @@ static void hpet_ram_write(void *opaque, hwaddr addr, if (!timer_is_periodic(timer) || (timer->config & HPET_TN_SETVAL)) { timer->cmp = (timer->cmp & 0xULL) | new_val << 32; -} else { +} +if (timer_is_periodic(timer)) { /* * FIXME: Clamp period to reasonable min value? * Clamp period to reasonable max value */ -new_val &= (timer->config & HPET_TN_32BIT ? ~0u : ~0ull) >> 1; +new_val = MIN(new_val, ~0u >> 1); timer->period = (timer->period & 0xULL) | new_val << 32; -} -timer->config &= ~HPET_TN_SETVAL; -if (hpet_enabled(s)) { -hpet_set_timer(timer); -} -break; +} +timer->config &= ~HPET_TN_SETVAL; +if (hpet_enabled(s)) { +hpet_set_timer(timer); +} +break; case HPET_TN_ROUTE: timer->fsb = (timer->fsb & 0xULL) | new_val; break; @@ -605,7 +605,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr, s->hpet_offset = ticks_to_ns(s->hpet_counter) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); for (i = 0; i < s->num_timers; i++) { -if ((>timer[i])->cmp != ~0ULL) { +if (timer_enabled(timer)) { hpet_set_timer(>timer[i]); } } (the final part is taken from your patch)? Thanks, Paolo On 6/18/24 15:10, TaiseiIto wrote: Before this commit, there are 3 problems about HPET timer interrupts. First, HPET periodic timers cause a too early interrupt before HPET main counter value reaches a value written its comparator value register. Second, disabled HPET timers whose comparator value register is not 0x cause wrong interrupts. Third, enabled HPET timers whose comparator value register is 0x don't cause any interrupts. About the first one, for example, an HPET driver writes 0x to an HPET periodic timer comparator value register. As a result, the register becomes 0x because writing to the higher 32 bits of the register doesn't affect itself in periodic mode. (see "case HPET_TN_CMP + 4" of "hpet_ram_write" function.) And "timer->period" which means interrupt period in periodic mode becomes 0x. Next, the HPET driver sets the HPET_CFG_ENABLE flag to start the main counter. The comparator value register (0x) indicate the next interrupt time. The period (0x) is added to the comparator value register at "hpet_timer" function because "hpet_time_after64" function returns true when the main counter is small. So, the first interrupt is planned when the main counter is 0x5554, but the first interrupt should occur when the main counter is 0x. To solve this problem, I fix the code to clear the higher 32 bits of comparator value registers of periodic mode
Re: [PATCH v2] i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT
Queued, thanks. Paolo
[PATCH 05/10] target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl
From: Richard Henderson Disconnect mmu index computation from the current pl as stored in env->hflags. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-2-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 11 ++- target/i386/cpu.c | 27 --- 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index c43ac01c794..1e121acef54 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2445,15 +2445,8 @@ static inline bool is_mmu_index_32(int mmu_index) return mmu_index & 1; } -static inline int cpu_mmu_index_kernel(CPUX86State *env) -{ -int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; -int mmu_index_base = -!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : -((env->hflags & HF_CPL_MASK) < 3 && (env->eflags & AC_MASK)) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; - -return mmu_index_base + mmu_index_32; -} +int x86_mmu_index_pl(CPUX86State *env, unsigned pl); +int cpu_mmu_index_kernel(CPUX86State *env); #define CC_DST (env->cc_dst) #define CC_SRC (env->cc_src) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index c05765eeafc..4688d140c2d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8122,18 +8122,39 @@ static bool x86_cpu_has_work(CPUState *cs) return x86_cpu_pending_interrupt(cs, cs->interrupt_request) != 0; } -static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +int x86_mmu_index_pl(CPUX86State *env, unsigned pl) { -CPUX86State *env = cpu_env(cs); int mmu_index_32 = (env->hflags & HF_CS64_MASK) ? 0 : 1; int mmu_index_base = -(env->hflags & HF_CPL_MASK) == 3 ? MMU_USER64_IDX : +pl == 3 ? MMU_USER64_IDX : !(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : (env->eflags & AC_MASK) ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX; return mmu_index_base + mmu_index_32; } +static int x86_cpu_mmu_index(CPUState *cs, bool ifetch) +{ +CPUX86State *env = cpu_env(cs); +return x86_mmu_index_pl(env, env->hflags & HF_CPL_MASK); +} + +static int x86_mmu_index_kernel_pl(CPUX86State *env, unsigned pl) +{ +int mmu_index_32 = (env->hflags & HF_LMA_MASK) ? 0 : 1; +int mmu_index_base = +!(env->hflags & HF_SMAP_MASK) ? MMU_KNOSMAP64_IDX : +(pl < 3 && (env->eflags & AC_MASK) + ? MMU_KNOSMAP64_IDX : MMU_KSMAP64_IDX); + +return mmu_index_base + mmu_index_32; +} + +int cpu_mmu_index_kernel(CPUX86State *env) +{ +return x86_mmu_index_kernel_pl(env, env->hflags & HF_CPL_MASK); +} + static void x86_disas_set_info(CPUState *cs, disassemble_info *info) { X86CPU *cpu = X86_CPU(cs); -- 2.45.2
[PATCH 06/10] target/i386/tcg: Compute MMU index once
Add the MMU index to the StackAccess struct, so that it can be cached or (in the next patch) computed from information that is not in CPUX86State. Co-developed-by: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 6b3de7a2be4..07e3667639a 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -587,36 +587,37 @@ typedef struct StackAccess target_ulong ss_base; target_ulong sp; target_ulong sp_mask; +int mmu_index; } StackAccess; static void pushw(StackAccess *sa, uint16_t val) { sa->sp -= 2; -cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stw_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static void pushl(StackAccess *sa, uint32_t val) { sa->sp -= 4; -cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), - val, sa->ra); +cpu_stl_mmuidx_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->mmu_index, sa->ra); } static uint16_t popw(StackAccess *sa) { -uint16_t ret = cpu_lduw_data_ra(sa->env, -sa->ss_base + (sa->sp & sa->sp_mask), -sa->ra); +uint16_t ret = cpu_lduw_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 2; return ret; } static uint32_t popl(StackAccess *sa) { -uint32_t ret = cpu_ldl_data_ra(sa->env, - sa->ss_base + (sa->sp & sa->sp_mask), - sa->ra); +uint32_t ret = cpu_ldl_mmuidx_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->mmu_index, sa->ra); sa->sp += 4; return ret; } @@ -677,6 +678,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); if (type == 5) { /* task gate */ @@ -858,12 +860,12 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, static void pushq(StackAccess *sa, uint64_t val) { sa->sp -= 8; -cpu_stq_kernel_ra(sa->env, sa->sp, val, sa->ra); +cpu_stq_mmuidx_ra(sa->env, sa->sp, val, sa->mmu_index, sa->ra); } static uint64_t popq(StackAccess *sa) { -uint64_t ret = cpu_ldq_data_ra(sa->env, sa->sp, sa->ra); +uint64_t ret = cpu_ldq_mmuidx_ra(sa->env, sa->sp, sa->mmu_index, sa->ra); sa->sp += 8; return ret; } @@ -982,6 +984,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; +sa.mmu_index = cpu_mmu_index_kernel(env); sa.sp_mask = -1; sa.ss_base = 0; if (dpl < cpl || ist != 0) { @@ -1116,6 +1119,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int, sa.sp = env->regs[R_ESP]; sa.sp_mask = 0x; sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (is_int) { old_eip = next_eip; @@ -1579,6 +1583,7 @@ void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip, sa.sp = env->regs[R_ESP]; sa.sp_mask = get_sp_mask(env->segs[R_SS].flags); sa.ss_base = env->segs[R_SS].base; +sa.mmu_index = cpu_mmu_index_kernel(env); if (shift) { pushl(, env->segs[R_CS].selector); @@ -1618,6 +1623,7 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, sa.env = env; sa.ra = GETPC(); +sa.mmu_index = cpu_mmu_index_kernel(env); if (e2 & DESC_S_MASK) { if (!(e2 & DESC_CS_MASK)) { @@ -1905,6 +1911,7 @@ void helper_iret_real(CPUX86State *env, int shift) sa.env = env; sa.ra = GETPC(); +sa.mmu_index = x86_mmu_index_pl(env, 0); sa.sp_mask = 0x; /* : use SS segment size? */ sa.sp = env->regs[R_ESP]; sa.ss_base = env->segs[R_SS].base; @@ -1976,8 +1983,11 @@ static inline void helper_ret_protected(CPUX86State *env, int shift, target_ulong new_eip, new_esp; StackAccess sa; +cpl = env->hflags & HF_CPL_MASK; + sa.env = env; sa.ra = retaddr; +sa.mmu_index = x86_mmu_index_pl(env, cpl); #ifdef TARGET_X86_64 if (shift == 2) { @@ -2032,7 +2042,6 @@
[PATCH 00/10] target/i386/tcg: fixes for seg_helper.c
This includes bugfixes: - allowing IRET from user mode to user mode with SMAP (do not use implicit kernel accesses, which break if the stack is in userspace) - use DPL-level accesses for interrupts and call gates - various fixes for task switching And two related cleanups: computing MMU index once for far calls and returns (including task switches), and using X86Access for TSS access. Tested with a really ugly patch to kvm-unit-tests, included after signature. Paolo Bonzini (7): target/i386/tcg: Allow IRET from user mode to user mode with SMAP target/i386/tcg: use PUSHL/PUSHW for error code target/i386/tcg: Compute MMU index once target/i386/tcg: Use DPL-level accesses for interrupts and call gates target/i386/tcg: check for correct busy state before switching to a new task target/i386/tcg: use X86Access for TSS access target/i386/tcg: save current task state before loading new one Richard Henderson (3): target/i386/tcg: Remove SEG_ADDL target/i386/tcg: Reorg push/pop within seg_helper.c target/i386/tcg: Introduce x86_mmu_index_{kernel_,}pl target/i386/cpu.h| 11 +- target/i386/cpu.c| 27 +- target/i386/tcg/seg_helper.c | 606 +++ 3 files changed, 354 insertions(+), 290 deletions(-) -- 2.45.2 diff --git a/lib/x86/usermode.c b/lib/x86/usermode.c index c3ec0ad7..0bf40c6d 100644 --- a/lib/x86/usermode.c +++ b/lib/x86/usermode.c @@ -5,13 +5,15 @@ #include "x86/desc.h" #include "x86/isr.h" #include "alloc.h" +#include "alloc_page.h" #include "setjmp.h" #include "usermode.h" #include "libcflat.h" #include -#define USERMODE_STACK_SIZE0x2000 +#define USERMODE_STACK_ORDER 1 /* 8k */ +#define USERMODE_STACK_SIZE(1 << (12 + USERMODE_STACK_ORDER)) #define RET_TO_KERNEL_IRQ 0x20 static jmp_buf jmpbuf; @@ -37,9 +39,14 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector, { extern char ret_to_kernel; volatile uint64_t rax = 0; - static unsigned char user_stack[USERMODE_STACK_SIZE]; + static unsigned char *user_stack; handler old_ex; + if (!user_stack) { + user_stack = alloc_pages(USERMODE_STACK_ORDER); + printf("%p\n", user_stack); + } + *raised_vector = 0; set_idt_entry(RET_TO_KERNEL_IRQ, _to_kernel, 3); old_ex = handle_exception(fault_vector, @@ -51,6 +58,8 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector, return 0; } + memcpy(user_stack + USERMODE_STACK_SIZE - 8, , 8); + asm volatile ( /* Prepare kernel SP for exception handlers */ "mov %%rsp, %[rsp0]\n\t" @@ -63,12 +72,13 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector, "pushq %[user_stack_top]\n\t" "pushfq\n\t" "pushq %[user_cs]\n\t" - "lea user_mode(%%rip), %%rax\n\t" + "lea user_mode+0x80(%%rip), %%rax\n\t" // smap.flat places usermode addresses at 8MB-16MB "pushq %%rax\n\t" "iretq\n" "user_mode:\n\t" /* Back up volatile registers before invoking func */ + "pop %%rax\n\t" "push %%rcx\n\t" "push %%rdx\n\t" "push %%rdi\n\t" @@ -78,11 +88,12 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector, "push %%r10\n\t" "push %%r11\n\t" /* Call user mode function */ + "add $0x80,%%rbp\n\t" "mov %[arg1], %%rdi\n\t" "mov %[arg2], %%rsi\n\t" "mov %[arg3], %%rdx\n\t" "mov %[arg4], %%rcx\n\t" - "call *%[func]\n\t" + "call *%%rax\n\t" /* Restore registers */ "pop %%r11\n\t" "pop %%r10\n\t" @@ -112,12 +123,11 @@ uint64_t run_in_user(usermode_func func, unsigned int fault_vector, [arg2]"m"(arg2), [arg3]"m"(arg3), [arg4]"m"(arg4), - [func]"m"(func), [user_ds]"i"(USER_DS), [user_cs]"i"(USER_CS), [kernel_ds]"rm"(KERNEL_DS),
[PATCH 02/10] target/i386/tcg: Allow IRET from user mode to user mode with SMAP
This fixes a bug wherein i386/tcg assumed an interrupt return using the IRET instruction was always returning from kernel mode to either kernel mode or user mode. This assumption is violated when IRET is used as a clever way to restore thread state, as for example in the dotnet runtime. There, IRET returns from user mode to user mode. This bug is that stack accesses from IRET and RETF, as well as accesses to the parameters in a call gate, are normal data accesses using the current CPL. This manifested itself as a page fault in the guest Linux kernel due to SMAP preventing the access. This bug appears to have been in QEMU since the beginning. Analyzed-by: Robert R. Henry Co-developed-by: Robert R. Henry Signed-off-by: Robert R. Henry Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 19d6b41a589..4977a5f5b3a 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -594,13 +594,13 @@ int exception_has_error_code(int intno) #define POPW_RA(ssp, sp, sp_mask, val, ra) \ {\ -val = cpu_lduw_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = cpu_lduw_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 2; \ } #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ +val = (uint32_t)cpu_ldl_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } @@ -847,7 +847,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, #define POPQ_RA(sp, val, ra)\ { \ -val = cpu_ldq_kernel_ra(env, sp, ra); \ +val = cpu_ldq_data_ra(env, sp, ra); \ sp += 8;\ } @@ -1797,18 +1797,18 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, PUSHL_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHL_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_ldl_kernel_ra(env, old_ssp + -((env->regs[R_ESP] + i * 4) & - old_sp_mask), GETPC()); +val = cpu_ldl_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 4) & old_sp_mask), + GETPC()); PUSHL_RA(ssp, sp, sp_mask, val, GETPC()); } } else { PUSHW_RA(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC()); PUSHW_RA(ssp, sp, sp_mask, env->regs[R_ESP], GETPC()); for (i = param_count - 1; i >= 0; i--) { -val = cpu_lduw_kernel_ra(env, old_ssp + - ((env->regs[R_ESP] + i * 2) & - old_sp_mask), GETPC()); +val = cpu_lduw_data_ra(env, + old_ssp + ((env->regs[R_ESP] + i * 2) & old_sp_mask), + GETPC()); PUSHW_RA(ssp, sp, sp_mask, val, GETPC()); } } -- 2.45.2
[PATCH 01/10] target/i386/tcg: Remove SEG_ADDL
From: Richard Henderson This truncation is now handled by MMU_*32_IDX. The introduction of MMU_*32_IDX in fact applied correct 32-bit wraparound to 16-bit accesses with a high segment base (e.g. big real mode or vm86 mode), which did not use SEG_ADDL. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-3-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index aee3d19f29b..19d6b41a589 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -579,10 +579,6 @@ int exception_has_error_code(int intno) } while (0) #endif -/* in 64-bit machines, this can overflow. So this segment addition macro - * can be used to trim the value to 32-bit whenever needed */ -#define SEG_ADDL(ssp, sp, sp_mask) ((uint32_t)((ssp) + (sp & (sp_mask - /* XXX: add a is_user flag to have proper security support */ #define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ {\ @@ -593,7 +589,7 @@ int exception_has_error_code(int intno) #define PUSHL_RA(ssp, sp, sp_mask, val, ra) \ { \ sp -= 4;\ -cpu_stl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), (uint32_t)(val), ra); \ +cpu_stl_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \ } #define POPW_RA(ssp, sp, sp_mask, val, ra) \ @@ -604,7 +600,7 @@ int exception_has_error_code(int intno) #define POPL_RA(ssp, sp, sp_mask, val, ra) \ { \ -val = (uint32_t)cpu_ldl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), ra); \ +val = (uint32_t)cpu_ldl_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \ sp += 4;\ } -- 2.45.2
[PATCH 10/10] target/i386/tcg: save current task state before loading new one
This is how the steps are ordered in the manual. EFLAGS.NT is overwritten after the fact in the saved image. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 85 +++- 1 file changed, 45 insertions(+), 40 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 77f2c65c3cf..7663e653e84 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -325,6 +325,42 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, access_prepare_mmu(, env, tss_base, tss_limit, MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); +/* save the current state in the old TSS */ +old_eflags = cpu_compute_eflags(env); +if (old_type & 8) { +/* 32 bit */ +access_stl(, env->tr.base + 0x20, next_eip); +access_stl(, env->tr.base + 0x24, old_eflags); +access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); +access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); +access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); +access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); +access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); +access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); +access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); +access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); +for (i = 0; i < 6; i++) { +access_stw(, env->tr.base + (0x48 + i * 4), + env->segs[i].selector); +} +} else { +/* 16 bit */ +access_stw(, env->tr.base + 0x0e, next_eip); +access_stw(, env->tr.base + 0x10, old_eflags); +access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); +access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); +access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); +access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); +access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); +access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); +access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]); +access_stw(, env->tr.base + (0x12 + 7 * 2), env->regs[R_EDI]); +for (i = 0; i < 4; i++) { +access_stw(, env->tr.base + (0x22 + i * 2), + env->segs[i].selector); +} +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ @@ -364,49 +400,16 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, if (source == SWITCH_TSS_JMP || source == SWITCH_TSS_IRET) { tss_set_busy(env, env->tr.selector, 0, retaddr); } -old_eflags = cpu_compute_eflags(env); + if (source == SWITCH_TSS_IRET) { old_eflags &= ~NT_MASK; +if (old_type & 8) { +access_stl(, env->tr.base + 0x24, old_eflags); +} else { +access_stw(, env->tr.base + 0x10, old_eflags); + } } -/* save the current state in the old TSS */ -if (old_type & 8) { -/* 32 bit */ -access_stl(, env->tr.base + 0x20, next_eip); -access_stl(, env->tr.base + 0x24, old_eflags); -access_stl(, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]); -access_stl(, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]); -access_stl(, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]); -access_stl(, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]); -access_stl(, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]); -access_stl(, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]); -access_stl(, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]); -access_stl(, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]); -for (i = 0; i < 6; i++) { -access_stw(, env->tr.base + (0x48 + i * 4), - env->segs[i].selector); -} -} else { -/* 16 bit */ -access_stw(, env->tr.base + 0x0e, next_eip); -access_stw(, env->tr.base + 0x10, old_eflags); -access_stw(, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]); -access_stw(, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]); -access_stw(, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]); -access_stw(, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]); -access_stw(, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]); -access_stw(, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]); -access_stw(, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]); -access_
[PATCH 04/10] target/i386/tcg: Reorg push/pop within seg_helper.c
From: Richard Henderson Interrupts and call gates should use accesses with the DPL as the privilege level. While computing the applicable MMU index is easy, the harder thing is how to plumb it in the code. One possibility could be to add a single argument to the PUSH* macros for the privilege level, but this is repetitive and risks confusion between the involved privilege levels. Another possibility is to pass both CPL and DPL, and adjusting both PUSH* and POP* to use specific privilege levels (instead of using cpu_{ld,st}*_data). This makes the code more symmetric. However, a more complicated but much nicer approach is to use a structure to contain the stack parameters, env, unwind return address, and rewrite the macros into functions. The struct provides an easy home for the MMU index as well. Signed-off-by: Richard Henderson Link: https://lore.kernel.org/r/20240617161210.4639-4-richard.hender...@linaro.org Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 439 +++ 1 file changed, 238 insertions(+), 201 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 0653bc10936..6b3de7a2be4 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -579,35 +579,47 @@ int exception_has_error_code(int intno) } while (0) #endif -/* XXX: add a is_user flag to have proper security support */ -#define PUSHW_RA(ssp, sp, sp_mask, val, ra) \ -{\ -sp -= 2; \ -cpu_stw_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \ -} +/* XXX: use mmu_index to have proper DPL support */ +typedef struct StackAccess +{ +CPUX86State *env; +uintptr_t ra; +target_ulong ss_base; +target_ulong sp; +target_ulong sp_mask; +} StackAccess; -#define PUSHL_RA(ssp, sp, sp_mask, val, ra) \ -{ \ -sp -= 4;\ -cpu_stl_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \ -} +static void pushw(StackAccess *sa, uint16_t val) +{ +sa->sp -= 2; +cpu_stw_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} -#define POPW_RA(ssp, sp, sp_mask, val, ra) \ -{\ -val = cpu_lduw_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ -sp += 2; \ -} +static void pushl(StackAccess *sa, uint32_t val) +{ +sa->sp -= 4; +cpu_stl_kernel_ra(sa->env, sa->ss_base + (sa->sp & sa->sp_mask), + val, sa->ra); +} -#define POPL_RA(ssp, sp, sp_mask, val, ra) \ -{ \ -val = (uint32_t)cpu_ldl_data_ra(env, (ssp) + (sp & (sp_mask)), ra); \ -sp += 4;\ -} +static uint16_t popw(StackAccess *sa) +{ +uint16_t ret = cpu_lduw_data_ra(sa->env, +sa->ss_base + (sa->sp & sa->sp_mask), +sa->ra); +sa->sp += 2; +return ret; +} -#define PUSHW(ssp, sp, sp_mask, val) PUSHW_RA(ssp, sp, sp_mask, val, 0) -#define PUSHL(ssp, sp, sp_mask, val) PUSHL_RA(ssp, sp, sp_mask, val, 0) -#define POPW(ssp, sp, sp_mask, val) POPW_RA(ssp, sp, sp_mask, val, 0) -#define POPL(ssp, sp, sp_mask, val) POPL_RA(ssp, sp, sp_mask, val, 0) +static uint32_t popl(StackAccess *sa) +{ +uint32_t ret = cpu_ldl_data_ra(sa->env, + sa->ss_base + (sa->sp & sa->sp_mask), + sa->ra); +sa->sp += 4; +return ret; +} /* protected mode interrupt */ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, @@ -615,12 +627,13 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, int is_hw) { SegmentCache *dt; -target_ulong ptr, ssp; +target_ulong ptr; int type, dpl, selector, ss_dpl, cpl; int has_error_code, new_stack, shift; -uint32_t e1, e2, offset, ss = 0, esp, ss_e1 = 0, ss_e2 = 0; -uint32_t old_eip, sp_mask, eflags; +uint32_t e1, e2, offset, ss = 0, ss_e1 = 0, ss_e2 = 0; +uint32_t old_eip, eflags; int vm86 = env->eflags & VM_MASK; +StackAccess sa; bool set_rf; has_error_code = 0; @@ -662,6 +675,9 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2); } +sa.env = env;
[PATCH 09/10] target/i386/tcg: use X86Access for TSS access
This takes care of probing the vaddr range in advance, and is also faster because it avoids repeated TLB lookups. It also matches the Intel manual better, as it says "Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory"; note however that it's not clear how the processor checks for segment descriptors, and this check is not included in the AMD manual. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 101 ++- 1 file changed, 51 insertions(+), 50 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 25af9d4a4ec..77f2c65c3cf 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -27,6 +27,7 @@ #include "exec/log.h" #include "helper-tcg.h" #include "seg_helper.h" +#include "access.h" int get_pg_mode(CPUX86State *env) { @@ -250,7 +251,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t e1, uint32_t e2, int source, uint32_t next_eip, uintptr_t retaddr) { -int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i; +int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, i; target_ulong tss_base; uint32_t new_regs[8], new_segs[6]; uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; @@ -258,6 +259,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, SegmentCache *dt; int index; target_ulong ptr; +X86Access old, new; type = (e2 >> DESC_TYPE_SHIFT) & 0xf; LOG_PCALL("switch_tss: sel=0x%04x type=%d src=%d\n", tss_selector, type, @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); } +/* X86Access avoids memory exceptions during the task switch */ +access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, + MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); + +if (source == SWITCH_TSS_CALL) { +/* Probe for future write of parent task */ +probe_access(env, tss_base, 2, MMU_DATA_STORE, +cpu_mmu_index_kernel(env), retaddr); +} +access_prepare_mmu(, env, tss_base, tss_limit, + MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -new_cr3 = cpu_ldl_kernel_ra(env, tss_base + 0x1c, retaddr); -new_eip = cpu_ldl_kernel_ra(env, tss_base + 0x20, retaddr); -new_eflags = cpu_ldl_kernel_ra(env, tss_base + 0x24, retaddr); +new_cr3 = access_ldl(, tss_base + 0x1c); +new_eip = access_ldl(, tss_base + 0x20); +new_eflags = access_ldl(, tss_base + 0x24); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_ldl_kernel_ra(env, tss_base + (0x28 + i * 4), -retaddr); +new_regs[i] = access_ldl(, tss_base + (0x28 + i * 4)); } for (i = 0; i < 6; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x48 + i * 4), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x48 + i * 4)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x60, retaddr); -new_trap = cpu_ldl_kernel_ra(env, tss_base + 0x64, retaddr); +new_ldt = access_ldw(, tss_base + 0x60); +new_trap = access_ldl(, tss_base + 0x64); } else { /* 16 bit */ new_cr3 = 0; -new_eip = cpu_lduw_kernel_ra(env, tss_base + 0x0e, retaddr); -new_eflags = cpu_lduw_kernel_ra(env, tss_base + 0x10, retaddr); +new_eip = access_ldw(, tss_base + 0x0e); +new_eflags = access_ldw(, tss_base + 0x10); for (i = 0; i < 8; i++) { -new_regs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x12 + i * 2), retaddr); +new_regs[i] = access_ldw(, tss_base + (0x12 + i * 2)); } for (i = 0; i < 4; i++) { -new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x22 + i * 2), - retaddr); +new_segs[i] = access_ldw(, tss_base + (0x22 + i * 2)); } -new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x2a, retaddr); +new_ldt = access_ldw(, tss_base + 0x2a); new_segs[R_FS] = 0; new_segs[R_GS] = 0; new_trap = 0; @@ -349,16 +360,6 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, chapters 12.2.5 and 13.2.4 on how to implement TSS Trap bit */ (void)new_trap; -/* NOTE: we must avoid memory exceptions during the task switch, - so we make dummy accesses before */ -/* XXX: it can still
[PATCH 03/10] target/i386/tcg: use PUSHL/PUSHW for error code
Do not pre-decrement esp, let the macros subtract the appropriate operand size. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 4977a5f5b3a..0653bc10936 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -670,22 +670,20 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, } shift = switch_tss(env, intno * 8, e1, e2, SWITCH_TSS_CALL, old_eip); if (has_error_code) { -uint32_t mask; - /* push the error code */ if (env->segs[R_SS].flags & DESC_B_MASK) { -mask = 0x; +sp_mask = 0x; } else { -mask = 0x; +sp_mask = 0x; } -esp = (env->regs[R_ESP] - (2 << shift)) & mask; -ssp = env->segs[R_SS].base + esp; +esp = env->regs[R_ESP]; +ssp = env->segs[R_SS].base; if (shift) { -cpu_stl_kernel(env, ssp, error_code); +PUSHL(ssp, esp, sp_mask, error_code); } else { -cpu_stw_kernel(env, ssp, error_code); +PUSHW(ssp, esp, sp_mask, error_code); } -SET_ESP(esp, mask); +SET_ESP(esp, sp_mask); } return; } -- 2.45.2
[PATCH 07/10] target/i386/tcg: Use DPL-level accesses for interrupts and call gates
This fixes a bug wherein i386/tcg assumed an interrupt return using the CALL or JMP instructions were always going from kernel or user mode to kernel mode, when using a call gate. This assumption is violated if the call gate has a DPL that is greater than 0. In addition, the stack accesses should count as explicit, not implicit ("kernel" in QEMU code), so that SMAP is not applied if DPL=3. Analyzed-by: Robert R. Henry Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249 Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 07e3667639a..1430f477c43 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -678,7 +678,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); if (type == 5) { /* task gate */ @@ -984,7 +984,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int, sa.env = env; sa.ra = 0; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); sa.sp_mask = -1; sa.ss_base = 0; if (dpl < cpl || ist != 0) { @@ -1119,7 +1119,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int, sa.sp = env->regs[R_ESP]; sa.sp_mask = 0x; sa.ss_base = env->segs[R_SS].base; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, 0); if (is_int) { old_eip = next_eip; @@ -1583,7 +1583,7 @@ void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip, sa.sp = env->regs[R_ESP]; sa.sp_mask = get_sp_mask(env->segs[R_SS].flags); sa.ss_base = env->segs[R_SS].base; -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, 0); if (shift) { pushl(, env->segs[R_CS].selector); @@ -1619,17 +1619,17 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC()); } cpl = env->hflags & HF_CPL_MASK; +dpl = (e2 >> DESC_DPL_SHIFT) & 3; LOG_PCALL("desc=%08x:%08x\n", e1, e2); sa.env = env; sa.ra = GETPC(); -sa.mmu_index = cpu_mmu_index_kernel(env); +sa.mmu_index = x86_mmu_index_pl(env, dpl); if (e2 & DESC_S_MASK) { if (!(e2 & DESC_CS_MASK)) { raise_exception_err_ra(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC()); } -dpl = (e2 >> DESC_DPL_SHIFT) & 3; if (e2 & DESC_C_MASK) { /* conforming code segment */ if (dpl > cpl) { @@ -1691,7 +1691,6 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip, } else { /* check gate type */ type = (e2 >> DESC_TYPE_SHIFT) & 0x1f; -dpl = (e2 >> DESC_DPL_SHIFT) & 3; rpl = new_cs & 3; #ifdef TARGET_X86_64 -- 2.45.2
[PATCH 08/10] target/i386/tcg: check for correct busy state before switching to a new task
This step is listed in the Intel manual: "Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return)". The AMD manual lists the same operation under the "Preventing recursion" paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor checks the busy bit in the IRET case. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 1430f477c43..25af9d4a4ec 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -306,6 +306,11 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, old_tss_limit_max = 43; } +/* new TSS must be busy iff the source is an IRET instruction */ +if (!!(e2 & DESC_TSS_BUSY_MASK) != (source == SWITCH_TSS_IRET)) { +raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); +} + /* read all the registers from the new TSS */ if (type & 8) { /* 32 bit */ -- 2.45.2
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
On Tue, Jul 9, 2024 at 2:18 PM Daniel P. Berrangé wrote: > My thought is that the initial merge focuses only on the build system > integration. So that's basically patches 1 + 2 in this series. > > Patch 3, the high level APIs is where I see most of the work and > collaboration being needed, but that doesn't need to be rushed into > the first merge. We would have a "rust" subsystem + maintainer who > would presumably have a staging tree, etc in the normal way we work > and collaborate It's complicated. A "perfect" build system integration would include integration with Kconfig, but it's not simple work and it may not be Manos's preference for what to work on (or maybe it is), and it's also not a blocker for further work on patches 3-4. On the other hand, patches 3 and 4 are _almost_ ready except for requiring a very new Rust - we know how to tackle that, but again it may take some time and it's completely separate work from better build system integration. In other words, improving build system integration is harder until merge, but merge is blocked by independent work on lowering the minimum supported Rust version. This is why I liked the idea of having either a development tree to allow a merge into early 9.2. On the other hand, given the exceptional scope (completely new code that can be disabled at will) and circumstances, even a very early merge into 9.1 (disabled by default) might be better to provide developers with the easiest base for experimenting. The requirements for merging, here, would basically amount to a good roadmap and some established good habits. An merge into early 9.2 would be a bit harder for experimenting, while merging it now would sacrifice CI integration in the initial stages of the work but make cooperation easier. However, it's difficult to say what's best without knowing Manos's rationale for preferring not to have a development tree yet. Paolo
Re: [RFC PATCH v4 2/7] rust: add bindgen step as a meson dependency
On Tue, Jul 9, 2024 at 2:09 PM Peter Maydell wrote: > * what is the actual baseline requirement? We definitely want >to support "using rustup on an older system" (should be no >problem) and "current distro building QEMU using the distro's >rust", I assume. It would certainly be nice to have "building >QEMU on the older-but-still-in-our-support-list distro releases >with that distro's rust", but this probably implies not just >a minimum rust version but also a limited set of crates. I don't think limiting ourselves to the set of crates in the distro is feasible. It's not the way the language works, for example I tried checking if the "cstr" crate exists and I didn't find it in Debian. We can define a known-good set of versions: * either via Cargo.lock (i.e. recording the versions and downloading them) or by including sources in the tree * at any time in qemu.git, at branching time (three times a year), or on every release (That's six possibilities overall). >I think (but forget the details) that some of what we've done >with Python where we accept that the older-but-still-supported >distro will end up taking the "download at build time" path >in the build system might apply also for Rust. Yes, and --without-download may be changed to the "--frozen" flag of cargo. > * what, on the Rust side, is the version requirement? Presumably >there's some features that are pretty much "we really need >this and can't do without it" and some which are "this would >be pretty awkward not to have but if we have to we can implement >some kind of fallback/alternative with a TODO note to come >back and clean up when our baseline moves forwards". Here are the stopping points that I found over the last couple weeks: 1.56.0: 2021 edition 1.59.0: const CStr::from_bytes_with_nul_unchecked (needed by cstr crate, see below) 1.64.0: std::ffi::c_char 1.65.0: Generic Associated Types 1.74.0: Clippy can be configured in Cargo.toml 1.77.0: C string literals, offset_of! I think 1.59.0 is pretty much the lower bound. Not having offset_of! will be a bit painful, but it can be worked around (and pretty much has to be, because 1.77.0 is really new). As far as I understand, Debian bullseye has 1.63.0 these days and it's the oldest platform we have. Paolo > At that point we have more information to figure out what > if any tradeoff we want to make. > > thanks > -- PMM >
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
On Tue, Jul 9, 2024 at 9:38 AM Manos Pitsidianakis wrote: > Ah, alright. That wasn't obvious because that e-mail was not directed > to me nor did it mention my name :) Oh, ok. Sorry about that. Generally when I say "we" I include as large a part of the community as applicable. > I do not want to do that, in any case. I do not think it's the right approach. No problem with that (and in fact I agree, as I'd prefer a speedy merge and doing the work on the QEMU master branch); however, we need to reach an agreement on that and everybody (including Daniel) needs to explain the reason for their position. Daniel's proposed criteria for merging include: - CI integration - CI passing for all supported targets (thus lowering the MSRV to 1.63.0) - plus any the code changes that were or will be requested during review That seems to be a pretty high amount of work, and until it's done everyone else is unable to contribute, not even in directions orthogonal to the above (cross compilation support, less unsafe code, porting more devices). So something has to give: either we decide for an early merge, where the code is marked as experimental and disabled by default. Personally I think it's fine, the contingency plan is simply to "git rm -rf rust/". Or we can keep the above stringent requirements for merging, but then I don't see it as a one-person job. If I can say so, developing on a branch would also be a useful warm-up for you in the maintainer role, if we expect that there will be significant community contributions to Rust. Paolo
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
Il lun 8 lug 2024, 20:39 Manos Pitsidianakis ha scritto: > > > On Mon, 8 Jul 2024, 21:34 Paolo Bonzini, wrote: > >> >> >> Il lun 8 lug 2024, 19:12 Daniel P. Berrangé ha >> scritto: >> >>> That's exactly why I suggest its a pre-requisite for merging >>> this. Unless we're able to demonstrate that we can enable >>> Rust on all our CI platforms, the benefits of Rust will >>> not be realized in QEMU, and we'll have never ending debates >>> about whether each given feature needs to be in C or Rust. >>> >> >> In that case we should develop it on a branch, so that more than one >> person can contribute (unlike if we keep iterating on this RFC). >> > > If you do, I'd really appreciate it if you did not use any part of my > patches. > "We" means that you would accept patches, review them and apply them until any agreed-upon conditions for merging are satisfied, and then send either a final series or a pull request for inclusion in QEMU. Paolo
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
Il lun 8 lug 2024, 19:12 Daniel P. Berrangé ha scritto: > That's exactly why I suggest its a pre-requisite for merging > this. Unless we're able to demonstrate that we can enable > Rust on all our CI platforms, the benefits of Rust will > not be realized in QEMU, and we'll have never ending debates > about whether each given feature needs to be in C or Rust. > In that case we should develop it on a branch, so that more than one person can contribute (unlike if we keep iterating on this RFC). Paolo > > > I also believe we should default to enabling rust toolchain by > > > default in configure, and require and explicit --without-rust > > > to disable it, *despite* it not technically being a mandatory > > > featureyet. > > > > > > > I guess the detection could be done, but actually enabling the build part > > needs to wait until the minimum supported version is low enough. > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org-o- > https://www.instagram.com/dberrange :| > >
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
Il lun 8 lug 2024, 18:33 Daniel P. Berrangé ha scritto: > This series is still missing changes to enable build on all targets > during CI, including cross-compiles, to prove that we're doing the > correct thing on all our targetted platforms. That's a must have > before considering it suitable for merge. > But we're not—in particular it's still using several features not in all supported distros. I also believe we should default to enabling rust toolchain by > default in configure, and require and explicit --without-rust > to disable it, *despite* it not technically being a mandatory > featureyet. > I guess the detection could be done, but actually enabling the build part needs to wait until the minimum supported version is low enough. Paolo > This is to give users a clear message that Rust is likely to > become a fundamental part of QEMU, so they need to give feedback > if they hit any problems / have use cases we've not anticipated > that are problematic wrt Rust. > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org-o- > https://www.instagram.com/dberrange :| > >
Re: [RFC PATCH v4 0/7] Add Rust support, implement ARM PL011
On 7/4/24 14:15, Manos Pitsidianakis wrote: Changes from v3->v4: - Add rust-specific files to .gitattributes - Added help text to scripts/cargo_wrapper.py arguments (thanks Stephan) - Split bindings separate crate - Add declarative macros for symbols exported to QEMU to said crate - Lowered MSRV to 1.77.2 - Removed auto-download and install of bindgen-cli - Fixed re-compilation of Rust objects in case they are missing from filesystem - Fixed optimized builds by adding #[used] (thanks Pierrick for the help debugging this) I think the largest issue is that I'd rather have a single cargo build using a virtual manifest, because my hunch is that it'd be the easiest path towards Kconfig integration. But it's better to do this after merge, as the changes are pretty large. It's also independent from any other changes targeted at removing unsafe code, so no need to hold back on merging. Other comments I made that should however be addressed before merging, from most to least important: - TODO comments when the code is doing potential undefined behavior - module structure should IMO resemble the C part of the tree - only generate bindings.rs.inc once - a couple abstractions that I'd like to have now: a trait to store the CStr corresponding to the structs, and one to generate all-zero structs without having to type "unsafe { MaybeUninit::zeroed().assume_init() }" - I pointed out a couple lints that are too broad and should be enabled per-file, even if right now it's basically all files that include them. - add support for --cargo and CARGO environment variables (if my patch works without too much hassle) - I'd like to use ctor instead of non-portable linker magic, and the cstr crate instead of CStr statics or c"" - please check if -Wl,--whole-archive can be replaced with link_whole: - probably, until Rust is enabled by default we should treat dependencies as a moving target and not commit Cargo.lock files. In the meanwhile we can discuss how to handle them. And a few aesthetic changes on top of this. With respect to lints, marking entire groups as "deny" is problematic. Before merge, I'd rather have the groups as just "warn", and add a long list of denied lints[1]. After merge we can also add a non-fatal CI job that runs clippy with nightly rust and with groups marked as "deny". This matches the check-python-tox job in python/. [1] https://github.com/bonzini/rust-qemu/commit/95b25f7c5f4e Thanks, Paolo
Re: [RFC PATCH v4 4/7] rust: add PL011 device model
On Thu, Jul 4, 2024 at 2:16 PM Manos Pitsidianakis wrote: > +ARM PL011 Rust device > +M: Manos Pitsidianakis > +S: Maintained > +F: rust/pl011/ No need for this, since it's covered by rust/. If (when) it replaces the main one, the PL011-specific stanza will be assigned to ARM maintainers (while you keep covering it via rust/). > +if with_rust > + subdir('rust') > +endif Should be in patch 3. > +subdir('pl011') As I said before it should be handled via Kconfig, but let's do that after the initial merge. However... > +correctness = { level = "deny", priority = -1 } > +suspicious = { level = "deny", priority = -1 } > +complexity = { level = "deny", priority = -1 } > +perf = { level = "deny", priority = -1 } > +cargo = { level = "deny", priority = -1 } > +nursery = { level = "deny", priority = -1 } > +style = { level = "deny", priority = -1 } > +# restriction group > +dbg_macro = "deny" > +rc_buffer = "deny" > +as_underscore = "deny" ... repeated lints really suggest that you should use a workspace and a single cargo invocation to build both the rust-qapi and pl011 crates, which I think is doable if you can run bindgen just once. > +use core::{mem::MaybeUninit, ptr::NonNull}; Let's remove at least this unsafety. > +#[used] > +pub static VMSTATE_PL011: VMStateDescription = VMStateDescription { > +name: PL011_ARM_INFO.name, > +unmigratable: true, > +..unsafe { MaybeUninitzeroed().assume_init() } > +}; > + > +#[no_mangle] > +pub unsafe extern "C" fn pl011_init(obj: *mut Object) { > +assert!(!obj.is_null()); > +let mut state = NonNull::new_unchecked(obj.cast::()); > +state.as_mut().init(); This is fine for now, but please add a // TODO: this assumes that "all zeroes" is a valid state for all fields of // PL011State. This is not necessarily true of any #[repr(Rust)] structs, // including bilge-generated types. It should instead use MaybeUninit. > +} > + > +qemu_api::module_init! { > +qom: register_type => { > +type_register_static(_ARM_INFO); > +} Can you make the macro look like MODULE_INIT_QOM: fn register_type() { ... } so that it's clear what "register_type" is, and so that it's easier to extend it to more values? > +#[doc(alias = "clk")] > +pub clock: NonNull, It's null when init() runs, so please use *mut Clock. > +#[doc(alias = "migrate_clk")] > +pub migrate_clock: bool, Please put all properties together in the struct for readability. > +} > + > +#[used] > +pub static CLK_NAME: = c"clk"; > + > +impl PL011State { > +pub fn init( self) { > +unsafe { > +memory_region_init_io( > +addr_of_mut!(self.iomem), > +addr_of_mut!(*self).cast::(), > +_OPS, > +addr_of_mut!(*self).cast::(), > +PL011_ARM_INFO.name, > +0x1000, > +); > +let sbd = addr_of_mut!(*self).cast::(); > +let dev = addr_of_mut!(*self).cast::(); > +sysbus_init_mmio(sbd, addr_of_mut!(self.iomem)); > +for irq in self.interrupts.iter_mut() { > +sysbus_init_irq(sbd, irq); > +} > +self.clock = NonNull::new(qdev_init_clock_in( > +dev, > +CLK_NAME.as_ptr(), > +None, /* pl011_clock_update */ > +addr_of_mut!(*self).cast::(), > +ClockEvent_ClockUpdate, > +)) > +.unwrap(); > +} > +} > + > +pub fn read( self, offset: hwaddr, _size: core::ffi::c_uint) -> u64 { > +use RegisterOffset::*; > + > +match RegisterOffset::try_from(offset) { > +Err(v) if (0x3f8..0x400).contains() => { > +u64::from(PL011_ID_ARM[((offset - 0xfe0) >> 2) as usize]) > +} > +Err(_) => { > +// qemu_log_mask(LOG_GUEST_ERROR, "pl011_read: Bad offset > 0x%x\n", (int)offset); > +0 > +} > +Ok(DR) => { > +// s->flags &= ~PL011_FLAG_RXFF; > +self.flags.set_receive_fifo_full(false); > +let c = self.read_fifo[self.read_pos]; > +if self.read_count > 0 { > +self.read_count -= 1; > +self.read_pos = (self.read_pos + 1) & (self.fifo_depth() > - 1); > +} > +if self.read_count == 0 { > +// self.flags |= PL011_FLAG_RXFE; > +self.flags.set_receive_fifo_empty(true); > +} > +if self.read_count + 1 == self.read_trigger { > +//self.int_level &= ~ INT_RX; > +self.int_level &= !registers::INT_RX; > +} > +// Update error bits. > +self.receive_status_error_clear = c.to_be_bytes()[3].into(); > +self.update(); > +unsafe { qemu_chr_fe_accept_input(
Re: [RFC PATCH v4 3/7] rust: add crate to expose bindings and interfaces
On 7/4/24 14:15, Manos Pitsidianakis wrote: Add rust/qemu-api, which exposes rust-bindgen generated FFI bindings and provides some declaration macros for symbols visible to the rest of QEMU. Signed-off-by: Manos Pitsidianakis --- MAINTAINERS | 7 ++ rust/.cargo/config.toml | 2 + rust/qemu-api/.gitignore | 2 + rust/qemu-api/Cargo.lock | 7 ++ rust/qemu-api/Cargo.toml | 59 ++ rust/qemu-api/README.md | 17 rust/qemu-api/build.rs| 48 +++ rust/qemu-api/deny.toml | 57 + rust/qemu-api/meson.build | 0 rust/qemu-api/rustfmt.toml| 1 + rust/qemu-api/src/bindings.rs | 8 ++ rust/qemu-api/src/definitions.rs | 112 + rust/qemu-api/src/device_class.rs | 131 ++ I'd rather put the various definitions in directories that roughly correspond to the C files, so rust/qemu-api/src/{util,qom,hw/core/}. +correctness = { level = "deny", priority = -1 } +suspicious = { level = "deny", priority = -1 } +complexity = { level = "deny", priority = -1 } +perf = { level = "deny", priority = -1 } +cargo = { level = "deny", priority = -1 } +nursery = { level = "deny", priority = -1 } +style = { level = "deny", priority = -1 } Groups are problematic because they can (and will) cause breakage when new failures are added. I think we should have a non-fatal job to test with groups, but by default we can add a large number of lints and "allow" the groups. +cast_ptr_alignment = "allow" Why? +missing_safety_doc = "allow" Please add it to individual files at the top. diff --git a/rust/qemu-api/README.md b/rust/qemu-api/README.md new file mode 100644 index 00..f16a1a929d --- /dev/null +++ b/rust/qemu-api/README.md @@ -0,0 +1,17 @@ +# QEMU bindings and API wrappers + +This library exports helper Rust types, Rust macros and C FFI bindings for internal QEMU APIs. + +The C bindings can be generated with `bindgen`, using this build target: + +```console +$ ninja bindings-aarch64-softmmu.rs +``` + +## Generate Rust documentation + +To generate docs for this crate, including private items: + +```sh +cargo doc --no-deps --document-private-items +``` diff --git a/rust/qemu-api/build.rs b/rust/qemu-api/build.rs new file mode 100644 index 00..13164f8371 --- /dev/null +++ b/rust/qemu-api/build.rs @@ -0,0 +1,48 @@ +// Copyright 2024 Manos Pitsidianakis +// SPDX-License-Identifier: GPL-2.0 OR GPL-3.0-or-later + +use std::{env, path::Path}; + +fn main() { +println!("cargo:rerun-if-env-changed=MESON_BUILD_ROOT"); +println!("cargo:rerun-if-changed=src/bindings.rs.inc"); + +let out_dir = env::var_os("OUT_DIR").unwrap(); + +if let Some(build_dir) = std::env::var_os("MESON_BUILD_ROOT") { +let mut build_dir = Path::new(_dir).to_path_buf(); +let mut out_dir = Path::new(_dir).to_path_buf(); +assert!( +build_dir.exists(), +"MESON_BUILD_ROOT value does not exist on filesystem: {}", +build_dir.display() +); +assert!( +build_dir.is_dir(), +"MESON_BUILD_ROOT value is not actually a directory: {}", +build_dir.display() +); +// TODO: add logic for other guest target architectures. +build_dir.push("bindings-aarch64-softmmu.rs"); +let bindings_rs = build_dir; +assert!( +bindings_rs.exists(), +"MESON_BUILD_ROOT/bindings-aarch64-softmmu.rs does not exist on filesystem: {}", +bindings_rs.display() +); +assert!( +bindings_rs.is_file(), +"MESON_BUILD_ROOT/bindings-aarch64-softmmu.rs is not a file: {}", +bindings_rs.display() +); +out_dir.push("bindings.rs"); +std::fs::copy(bindings_rs, out_dir).unwrap(); +println!("cargo:rustc-cfg=MESON_BINDINGS_RS"); +} else if !Path::new("src/bindings.rs.inc").exists() { +panic!( +"No generated C bindings found! Either build them manually with bindgen or with meson \ + (`ninja bindings-aarch64-softmmu.rs`) and copy them to src/bindings.rs.inc, or build \ + through meson." +); +} +} diff --git a/rust/qemu-api/deny.toml b/rust/qemu-api/deny.toml new file mode 100644 index 00..3992380509 --- /dev/null +++ b/rust/qemu-api/deny.toml @@ -0,0 +1,57 @@ +# cargo-deny configuration file + +[graph] +targets = [ +"aarch64-unknown-linux-gnu", +"x86_64-unknown-linux-gnu", +"x86_64-apple-darwin", +"aarch64-apple-darwin", +"x86_64-pc-windows-gnu", +"aarch64-pc-windows-gnullvm", Do we actually support LLVM Windows targets? +unsafe impl Sync for TypeInfo {} +unsafe impl Sync for VMStateDescription {} + +#[macro_export] +macro_rules! module_init { +($func:expr, $type:expr) => { +#[used] +
Re: [RFC PATCH v4 2/7] rust: add bindgen step as a meson dependency
On Thu, Jul 4, 2024 at 2:16 PM Manos Pitsidianakis wrote: > > Add mechanism to generate rust hw targets that depend on a custom > bindgen target for rust bindings to C. > > This way bindings will be created before the rust crate is compiled. > > The bindings will end up in BUILDDIR/{target}-generated.rs and have the same > name > as a target: > > ninja aarch64-softmmu-generated.rs Is there anything target-dependent in these files? You can generate it just once I think. > + if with_rust and target_type == 'system' > + # FIXME: meson outputs the following warnings, which should be > resolved > + # before merging: > + # > WARNING: Project specifies a minimum meson_version '>=0.63.0' but > + # > uses features which were added in newer versions: > + # > * 0.64.0: {'fs.copyfile'} you can use configure_file() instead. > + # > * 1.0.0: {'dependencies arg in rust.bindgen', 'module rust as > stable module'} You can use if meson.version().version_compare('>=1.0.0') ... else error('Rust support requires Meson version 1.0.0') endif > + if target in rust_targets > +rust_hw = ss.source_set() > +foreach t: rust_targets[target] > + rust_device_cargo_build = custom_target(t['name'], > + output: t['output'], > + depends: [bindings_rs], > + build_always_stale: true, > + command: t['command']) Also "console: true". > + rust_dep = declare_dependency(link_args: [ > + '-Wl,--whole-archive', > + t['output'], > + '-Wl,--no-whole-archive' > + ], This is not portable, but I think you can use just "link_whole: rust_device_cargo_build". > +msrv = { > + 'rustc': '1.77.2', > + 'cargo': '1.77.2', I think rustc and cargo are always matching, so no need to check both of them? > +foreach rust_hw_target, rust_hws: rust_hw_target_list > + foreach rust_hw_dev: rust_hws Finding the available devices should be done using Kconfig symbols, probably by passing the path to the config-devices.mak file to build.rs via an environment variable. > +#include "qemu/osdep.h" > +#include "qemu/module.h" > +#include "qemu-io.h" > +#include "sysemu/sysemu.h" > +#include "hw/sysbus.h" > +#include "exec/memory.h" > +#include "chardev/char-fe.h" > +#include "hw/clock.h" > +#include "hw/qdev-clock.h" > +#include "hw/qdev-properties.h" > +#include "hw/qdev-properties-system.h" > +#include "hw/irq.h" > +#include "qapi/error.h" > +#include "migration/vmstate.h" > +#include "chardev/char-serial.h" I think all of these should be target-independent? > diff --git a/scripts/cargo_wrapper.py b/scripts/cargo_wrapper.py > index d2c7265461..e7d9238c16 100644 > --- a/scripts/cargo_wrapper.py > +++ b/scripts/cargo_wrapper.py > @@ -111,6 +111,8 @@ def get_cargo_rustc(args: argparse.Namespace) -> > tuple[Dict[str, Any], List[str] > > env = os.environ > env["CARGO_ENCODED_RUSTFLAGS"] = cfg > +env["MESON_BUILD_DIR"] = str(target_dir) > +env["MESON_BUILD_ROOT"] = str(args.meson_build_root) > > return (env, cargo_cmd) > > @@ -231,19 +233,11 @@ def main() -> None: > default=[], > ) > parser.add_argument( > -"--meson-build-dir", > -metavar="BUILD_DIR", > -help="meson.current_build_dir()", > +"--meson-build-root", > +metavar="BUILD_ROOT", > +help="meson.project_build_root(): the root build directory. Example: > '/path/to/qemu/build'", > type=Path, > -dest="meson_build_dir", > -required=True, > -) > -parser.add_argument( > -"--meson-source-dir", > -metavar="SOURCE_DIR", > -help="meson.current_source_dir()", > -type=Path, > -dest="meson_build_dir", > +dest="meson_build_root", > required=True, > ) > parser.add_argument( Please squash in the previous patch. Paolo
Re: [RFC PATCH v4 1/7] build-sys: Add rust feature option
On Thu, Jul 4, 2024 at 2:16 PM Manos Pitsidianakis wrote: > > Add options for Rust in meson_options.txt, meson.build, configure to > prepare for adding Rust code in the followup commits. > > `rust` is a reserved meson name, so we have to use an alternative. > `with_rust` was chosen. Did you find any problem with the other approach that I sent, to support --cargo and the CARGO environment variable in configure? Paolo
Re: [PATCH v3 4/6] target/i386: add support for VMX FRED controls
Il sab 6 lug 2024, 17:57 Li, Xin3 ha scritto: > >> The bits in the secondary vmexit controls are not supported, and in > general the same > >> is true for the secondary vmexit case. I think it's better to not > include the vmx-entry- > >> load-fred bit either, and only do the vmxcap changes. > > > Right, we don't need it at all. > > Hi Paolo, > > We actually do need the following change for nested FRED guests to boot: > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 227ee1c759..dcf914a7ec 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -1285,7 +1285,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = { > NULL, "vmx-entry-ia32e-mode", NULL, NULL, > NULL, "vmx-entry-load-perf-global-ctrl", > "vmx-entry-load-pat", "vmx-entry-load-efer", > "vmx-entry-load-bndcfgs", NULL, "vmx-entry-load-rtit-ctl", > NULL, > -NULL, NULL, "vmx-entry-load-pkrs", NULL, > +NULL, NULL, "vmx-entry-load-pkrs", "vmx-entry-load-fred", > NULL, NULL, NULL, NULL, > NULL, NULL, NULL, NULL, > }, > > Or do you think it's not the root cause? > The patch is correct but is FRED supported in nested VMX? Or is it with not yet merged patches? Paolo > Thanks! > Xin > >
Re: [PATCH 00/14] rust: example of bindings code for Rust in QEMU
Hi, first of all I want to clarify the raison d'etre for this posting, which I have also explained to Manos. Nothing you see here is code that will be certainly included in QEMU; it's (mostly) throwaway by design. I don't have much attachment to any of the code except perhaps the casting and reference counting stuff (which is in, like, the third rewrite... I got nerd-sniped there :)). But at the same time I think it's plausibly not too different (in look and complexity) from what the actual code will look like. It's also not an attempt to bypass/leapfrog other people in the design; it's more a "let's be ready for this to happen" than a design document. The first step and the more immediate focus remains the build system integration. But you have good questions and observations so I'll put something below about the design as well. On Fri, Jul 5, 2024 at 8:52 PM Pierrick Bouvier wrote: > To give an example of questions we could have: > > Why should we have distinct structs for a device, the object > represented, and its state? If you refer to the conf and state, a bit because of the different traits required (ConstDefault is needed for properties but not the rest) and especially because we can enforce immutability of configuration vs. interior mutability of state. In particular, from Manos's prototype I noticed that you need to access the Chardev while the state is *not* borrowed; methods on the Chardev call back into the device and the callback needs mutable access to the state. So some kind of separation seems to be necessary, for better or worse, unless interior mutability is achieved with a dozen Cells (not great). > Having to implement ObjectImpl and DeviceImpl looks like an > implementation detail, that could be derived automatically from a > device. What's wrong with calling a realize that does nothing in the end? The problem is not calling a realize that does nothing, it's that you are forced to override the superclass implementation (which might be in C) if you don't go through Option<>. class_init methods update a few superclass methods but not all of them. The vtable in ObjectImpl/DeviceImpl could be generated via macros; for now I did it by hand to avoid getting carried away too much with syntactic sugar. > Could device state be serialized with something like serde? Probably not, there's extra complications for versioning. A long time ago there was an attempt to have some kind of IDL embedded in C code (not unlike Rust attributes) to automatically generate properties and vmstate, but it was too ambitious. (https://www.linux-kvm.org/images/b/b5/2012-forum-qidl-talk.pdf, slides 1-9). Generating property and vmstate declarations could be done with "normal" macros just like in C, or with attribute macros (needs a lot more code and experience, wouldn't do it right away). However, serde for QAPI and/or visitors may be a possibility, after all JSON is serde's bread and butter. > This is the kind of things that could be discussed, on a reduced > example, without specially looking at how to implement that concretely, > in a first time. There are some things that Rust really hates that you do, and they aren't always obvious. Therefore in this exercise I tried to let intuition guide me, and see how much the type system fought that intuition (not much actually!). I started from the more technical and less artistic part to see if I was able to get somewhere. But yes, it's a great exercise to do this experiment from the opposite end. Then the actual glue code will "meet in the middle", applying lessons learnt from both experiments, with individual pieces of the real interface implemented and applied to the PL011 sample at the same time. The main missing thing in PL011 is DMA, otherwise it's a nice playground. > usage of magic macros as syntactic sugar should indeed be thought twice. > Return a collection of QDevProperty could be better than having a magic > @properties syntactic sugar. No hard opinions there, sure. I went for the magic syntax because the properties cannot be a const and I wanted to hide the ugly "static mut", but certainly there could be better ways to do it. > Another thing that could be discussed is: do we want to have the whole > inheritance mechanism for Rust devices? Is it really needed? Eh, this time unfortunately I think it is. Stuff like children and properties, which are methods of Object, need to be available to subclasses in instance_init and/or realize. Reset is in Device and we have the whole Device/SysBusDevice/PCIDevice set of classes, so you need to access superclass methods from subclasses. It's not a lot of code though. > Between having a clean and simple Device definition, with a bit more of > magic glue (hidden internally), and requires devices to do some magic > initialization to satisfy existing architecture, what would be the best? > Even though it takes 1000 more lines, I would be in favor to have > Devices that are as clean and simple as possible.
Re: [PATCH 00/14] rust: example of bindings code for Rust in QEMU
On Thu, Jul 4, 2024 at 9:26 PM Pierrick Bouvier wrote: > > Patches 9-10 deal with how to define new subclasses in Rust. They are > > a lot less polished and less ready. There is probably a lot of polish > > that could be applied to make the code look nicer, but I guess there is > > always time to make the code look nicer until the details are sorted out. > > > > The things that I considered here are: > > > > - splitting configuration from runtime state > > - automatic generation of instance_init and instance_finalize > > - generation of C vtables from safe code that is written in Rust > > - generation of qdev property tables > > Shouldn't we focus more on how we want to write a QOM device in Rust > instead, by making abstraction of existing C implementation first? > Once we have satisfying idea, we could evaluate how it maps to existing > implementation, and which compromise should be made for efficiency. > It seems that the current approach is to stick to existing model, and > derive Rust bindings from this. I think I tried to do a little bit of that. For example, the split of configuration from runtime state is done to isolate the interior mutability, and the automatic generation of instance_init and instance_finalize relies on Rust traits like Default and Drop; the realize implementation returns a qemu::Result<()>; and so on. But there are many steps towards converting the PL011 device to Rust: - declarative definitions of bitfields and registers (done) - using RefCell for mutable state - using wrappers for class and property definition - defining and using wrappers for Chardev/CharBackend - defining and using wrappers for MemoryRegion callbacks - defining and using wrappers for SysBusDevice functionality - defining and using wrappers for VMState functionality At each step we need to design "how we want to do that in Rust" and all the things you mention above. This prototype covers the second and third steps, and it's already big enough! :) I expect that code like this one would be merged together with the corresponding changes to PL011: the glue code is added to the qemu/ crate and used in the hw/core/pl011/ directory. This way, both authors and reviewers will have a clue of what becomes more readable/idiomatic in the resulting code. > I agree this glue can be a source of technical debt, but on the other > side, it should be easy to refactor it, if we decided first what the > "clean and idiomatic" Rust API should look like. That's true, especially if we want to add more macro magic on top. We can decide later where to draw the line between complexity of glue code and cleanliness of Rust code, and also move the line as we see fit. On the other hand, if we believe that _this_ code is already too much, that's also a data point. Again: I don't think it is, but I want us to make an informed decision. > A compromise where you have to manually translate some structs, or copy > memory to traverse languages borders at some point, could be worth the > safety and expressiveness. Yep, Error is an example of this. It's not the common case, so it's totally fine to convert to and from C Error* (which also includes copying the inner error string, from a String to malloc-ed char*) to keep the APIs nicer. > > We should have an idea of what this glue code looks like, in order to make > > an informed choice. If we think we're not comfortable with reviewing it, > > well, we should be ready to say so and stick with C until we are. > > While it is important that this glue code is maintainable and looks > great, I don't think it should be the main reason to accept usage of a > new language. Not the main reason, but an important factor in judging if we are *able* to accept usage of a new language. Maybe it's just a formal step, but it feels safer to have _some_ code to show and to read, even if it's just a prototype that barely compiles. > We could have a specific trait with functions to handle various kind of > events. And the glue code could be responsible to translate this to > callbacks for the C side (and calling Rust code accordingly, eventually > serializing this on a single thread to avoid any race issues). That's a possibility, yes. Something like (totally random): impl CharBackend { pub fn start( self, obj: ) { qemu_chr_fe_set_handlers(self.0, Some(obj::can_receive), Some(obj::receive, obj::event), Some(obj::be_change), obj as *const _ as *const c_void, ptr::null(), true); } pub fn stop( self) { qemu_chr_fe_set_handlers(self.0, None, None, None, ptr::null(), ptr::null(), true); } } but I left for later because I focused just on the lowest levels code where you could have "one" design. For example, for memory regions some devices are going to have more than one, so there could be reasons to do callbacks in different ways. By the way, all of this
[PULL 15/16] target/i386: add support for masking CPUID features in confidential guests
Some CPUID features may be provided by KVM for some guests, independent of processor support, for example TSC deadline or TSC adjust. If these are not supported by the confidential computing firmware, however, the guest will fail to start. Add support for removing unsupported features from "-cpu host". Signed-off-by: Paolo Bonzini --- target/i386/confidential-guest.h | 24 target/i386/kvm/kvm.c| 5 + 2 files changed, 29 insertions(+) diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h index 532e172a60b..7342d2843aa 100644 --- a/target/i386/confidential-guest.h +++ b/target/i386/confidential-guest.h @@ -39,6 +39,8 @@ struct X86ConfidentialGuestClass { /* */ int (*kvm_type)(X86ConfidentialGuest *cg); +uint32_t (*mask_cpuid_features)(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index, +int reg, uint32_t value); }; /** @@ -56,4 +58,26 @@ static inline int x86_confidential_guest_kvm_type(X86ConfidentialGuest *cg) return 0; } } + +/** + * x86_confidential_guest_mask_cpuid_features: + * + * Removes unsupported features from a confidential guest's CPUID values, returns + * the value with the bits removed. The bits removed should be those that KVM + * provides independent of host-supported CPUID features, but are not supported by + * the confidential computing firmware. + */ +static inline int x86_confidential_guest_mask_cpuid_features(X86ConfidentialGuest *cg, + uint32_t feature, uint32_t index, + int reg, uint32_t value) +{ +X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg); + +if (klass->mask_cpuid_features) { +return klass->mask_cpuid_features(cg, feature, index, reg, value); +} else { +return value; +} +} + #endif diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index dd8b0f33136..056d117cd11 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -548,6 +548,11 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, ret |= 1U << KVM_HINTS_REALTIME; } +if (current_machine->cgs) { +ret = x86_confidential_guest_mask_cpuid_features( +X86_CONFIDENTIAL_GUEST(current_machine->cgs), +function, index, reg, ret); +} return ret; } -- 2.45.2
[PULL 00/16] meson, i386 changes for 2024-07-04
The following changes since commit 1a2d52c7fcaeaaf4f2fe8d4d5183dccaeab67768: Merge tag 'pull-request-2024-07-02' of https://gitlab.com/thuth/qemu into staging (2024-07-02 15:49:08 -0700) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 188569c10d5dc6996bde90ce25645083e9661ecb: target/i386/SEV: implement mask_cpuid_features (2024-07-04 11:56:20 +0200) * meson: Pass objects and dependencies to declare_dependency(), not static_library() * meson: Drop the .fa library suffix * target/i386: drop AMD machine check bits from Intel CPUID * target/i386: add avx-vnni-int16 feature * target/i386: SEV bugfixes * target/i386: SEV-SNP -cpu host support * char: fix exit issues Akihiko Odaki (2): meson: Pass objects and dependencies to declare_dependency() Revert "meson: Propagate gnutls dependency" Maxim Mikityanskiy (1): char-stdio: Restore blocking mode of stdout on exit Michal Privoznik (2): i386/sev: Fix error message in sev_get_capabilities() i386/sev: Fallback to the default SEV device if none provided in sev_get_capabilities() Paolo Bonzini (11): meson: move shared_module() calls where modules are already walked meson: move block.syms dependency out of libblock meson: merge plugin_ldflags into emulator_link_args meson: Drop the .fa library suffix target/i386: pass X86CPU to x86_cpu_get_supported_feature_word target/i386: drop AMD machine check bits from Intel CPUID target/i386: SEV: fix formatting of CPUID mismatch message target/i386: do not include undefined bits in the AMD topoext leaf target/i386: add avx-vnni-int16 feature target/i386: add support for masking CPUID features in confidential guests target/i386/SEV: implement mask_cpuid_features docs/devel/build-system.rst | 8 +-- meson.build | 100 ++-- target/i386/confidential-guest.h| 24 + target/i386/cpu.h | 10 +++- chardev/char-stdio.c| 4 ++ hw/i386/pc.c| 1 + stubs/blk-exp-close-all.c | 2 +- target/i386/cpu.c | 40 +++ target/i386/kvm/kvm-cpu.c | 2 +- target/i386/kvm/kvm.c | 5 ++ target/i386/sev.c | 51 ++ .gitlab-ci.d/buildtest-template.yml | 2 - .gitlab-ci.d/buildtest.yml | 2 - block/meson.build | 2 +- gdbstub/meson.build | 6 +-- io/meson.build | 2 +- plugins/meson.build | 7 ++- pythondeps.toml | 2 +- storage-daemon/meson.build | 3 +- tcg/meson.build | 8 +-- tests/Makefile.include | 2 +- tests/qtest/libqos/meson.build | 3 +- ui/meson.build | 2 +- 23 files changed, 183 insertions(+), 105 deletions(-) -- 2.45.2
[PULL 09/16] target/i386: SEV: fix formatting of CPUID mismatch message
Fixes: 70943ad8e4d ("i386/sev: Add support for SNP CPUID validation", 2024-06-05) Signed-off-by: Paolo Bonzini --- target/i386/sev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target/i386/sev.c b/target/i386/sev.c index 3ab8b3c28b7..2a0f94d390d 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -841,7 +841,7 @@ sev_snp_cpuid_report_mismatches(SnpCpuidInfo *old, size_t i; if (old->count != new->count) { -error_report("SEV-SNP: CPUID validation failed due to count mismatch," +error_report("SEV-SNP: CPUID validation failed due to count mismatch, " "provided: %d, expected: %d", old->count, new->count); return; } @@ -853,8 +853,8 @@ sev_snp_cpuid_report_mismatches(SnpCpuidInfo *old, new_func = >entries[i]; if (memcmp(old_func, new_func, sizeof(SnpCpuidFunc))) { -error_report("SEV-SNP: CPUID validation failed for function 0x%x, index: 0x%x" - "provided: eax:0x%08x, ebx: 0x%08x, ecx: 0x%08x, edx: 0x%08x" +error_report("SEV-SNP: CPUID validation failed for function 0x%x, index: 0x%x, " + "provided: eax:0x%08x, ebx: 0x%08x, ecx: 0x%08x, edx: 0x%08x, " "expected: eax:0x%08x, ebx: 0x%08x, ecx: 0x%08x, edx: 0x%08x", old_func->eax_in, old_func->ecx_in, old_func->eax, old_func->ebx, old_func->ecx, old_func->edx, -- 2.45.2
[PULL 07/16] target/i386: pass X86CPU to x86_cpu_get_supported_feature_word
This allows modifying the bits in "-cpu max"/"-cpu host" depending on the guest CPU vendor (which, at least by default, is the host vendor in the case of KVM). For example, machine check architecture differs between Intel and AMD, and bits from AMD should be dropped when configuring the guest for an Intel model. Cc: Xiaoyao Li Cc: John Allen Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 3 +-- target/i386/cpu.c | 11 +-- target/i386/kvm/kvm-cpu.c | 2 +- 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 29daf370485..9bea7142bf4 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -666,8 +666,7 @@ typedef enum FeatureWord { } FeatureWord; typedef uint64_t FeatureWordArray[FEATURE_WORDS]; -uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, -bool migratable_only); +uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w); /* cpuid_features bits */ #define CPUID_FP87 (1U << 0) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 4c2e6f3a71e..4364cb0f8e3 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6035,8 +6035,7 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp) #endif /* !CONFIG_USER_ONLY */ -uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, -bool migratable_only) +uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w) { FeatureWordInfo *wi = _word_info[w]; uint64_t r = 0; @@ -6078,7 +6077,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, r &= ~unavail; } #endif -if (migratable_only) { +if (cpu && cpu->migratable) { r &= x86_cpu_get_migratable_flags(w); } return r; @@ -7371,7 +7370,7 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp) * by the user. */ env->features[w] |= -x86_cpu_get_supported_feature_word(w, cpu->migratable) & +x86_cpu_get_supported_feature_word(cpu, w) & ~env->user_features[w] & ~feature_word_info[w].no_autoenable_flags; } @@ -7497,7 +7496,7 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool verbose) for (w = 0; w < FEATURE_WORDS; w++) { uint64_t host_feat = -x86_cpu_get_supported_feature_word(w, false); +x86_cpu_get_supported_feature_word(NULL, w); uint64_t requested_features = env->features[w]; uint64_t unavailable_features = requested_features & ~host_feat; mark_unavailable_features(cpu, w, unavailable_features, prefix); @@ -7617,7 +7616,7 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_LBR_FMT; if (requested_lbr_fmt && kvm_enabled()) { uint64_t host_perf_cap = -x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false); +x86_cpu_get_supported_feature_word(NULL, FEAT_PERF_CAPABILITIES); unsigned host_lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT; if (!cpu->enable_pmu) { diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c index d57a68a301e..6bf8dcfc607 100644 --- a/target/i386/kvm/kvm-cpu.c +++ b/target/i386/kvm/kvm-cpu.c @@ -143,7 +143,7 @@ static void kvm_cpu_xsave_init(void) if (!esa->size) { continue; } -if ((x86_cpu_get_supported_feature_word(esa->feature, false) & esa->bits) +if ((x86_cpu_get_supported_feature_word(NULL, esa->feature) & esa->bits) != esa->bits) { continue; } -- 2.45.2
[PULL 13/16] target/i386: add avx-vnni-int16 feature
AVX-VNNI-INT16 (CPUID[EAX=7,ECX=1).EDX[10]) is supported by Clearwater Forest processor, add it to QEMU as it does not need any specific enablement. Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index c40551d9bfb..c05765eeafc 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1131,7 +1131,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .feat_names = { NULL, NULL, NULL, NULL, "avx-vnni-int8", "avx-ne-convert", NULL, NULL, -"amx-complex", NULL, NULL, NULL, +"amx-complex", NULL, "avx-vnni-int16", NULL, NULL, NULL, "prefetchiti", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, -- 2.45.2
[PULL 14/16] char-stdio: Restore blocking mode of stdout on exit
From: Maxim Mikityanskiy qemu_chr_open_fd() sets stdout into non-blocking mode. Restore the old fd flags on exit to avoid breaking unsuspecting applications that run on the same terminal after qemu and don't expect to get EAGAIN. While at at, also ensure term_exit is called once (at the moment it's called both from char_stdio_finalize() and as the atexit() hook. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2423 Signed-off-by: Maxim Mikityanskiy Link: https://lore.kernel.org/r/20240703190812.3459514-1-maxtra...@gmail.com Signed-off-by: Paolo Bonzini --- chardev/char-stdio.c | 4 1 file changed, 4 insertions(+) diff --git a/chardev/char-stdio.c b/chardev/char-stdio.c index 3c648678ab1..b960ddd4e4c 100644 --- a/chardev/char-stdio.c +++ b/chardev/char-stdio.c @@ -41,6 +41,7 @@ /* init terminal so that we can grab keys */ static struct termios oldtty; static int old_fd0_flags; +static int old_fd1_flags; static bool stdio_in_use; static bool stdio_allow_signal; static bool stdio_echo_state; @@ -50,6 +51,8 @@ static void term_exit(void) if (stdio_in_use) { tcsetattr(0, TCSANOW, ); fcntl(0, F_SETFL, old_fd0_flags); +fcntl(1, F_SETFL, old_fd1_flags); +stdio_in_use = false; } } @@ -102,6 +105,7 @@ static void qemu_chr_open_stdio(Chardev *chr, stdio_in_use = true; old_fd0_flags = fcntl(0, F_GETFL); +old_fd1_flags = fcntl(1, F_GETFL); tcgetattr(0, ); if (!g_unix_set_fd_nonblocking(0, true, NULL)) { error_setg_errno(errp, errno, "Failed to set FD nonblocking"); -- 2.45.2
[PULL 08/16] target/i386: drop AMD machine check bits from Intel CPUID
The recent addition of the SUCCOR bit to kvm_arch_get_supported_cpuid() causes the bit to be visible when "-cpu host" VMs are started on Intel processors. While this should in principle be harmless, it's not tidy and we don't even know for sure that it doesn't cause any guest OS to take unexpected paths. Since x86_cpu_get_supported_feature_word() can return different different values depending on the guest, adjust it to hide the SUCCOR bit if the guest has non-AMD vendor. Suggested-by: Xiaoyao Li Cc: John Allen Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 4364cb0f8e3..5e5bf71702c 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6039,6 +6039,7 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w) { FeatureWordInfo *wi = _word_info[w]; uint64_t r = 0; +uint32_t unavail = 0; if (kvm_enabled()) { switch (wi->type) { @@ -6064,19 +6065,33 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w) } else { return ~0; } + +switch (w) { #ifndef TARGET_X86_64 -if (w == FEAT_8000_0001_EDX) { +case FEAT_8000_0001_EDX: /* * 32-bit TCG can emulate 64-bit compatibility mode. If there is no * way for userspace to get out of its 32-bit jail, we can leave * the LM bit set. */ -uint32_t unavail = tcg_enabled() +unavail = tcg_enabled() ? CPUID_EXT2_LM & ~CPUID_EXT2_KERNEL_FEATURES : CPUID_EXT2_LM; -r &= ~unavail; -} +break; #endif + +case FEAT_8000_0007_EBX: +if (cpu && !IS_AMD_CPU(>env)) { +/* Disable AMD machine check architecture for Intel CPU. */ +unavail = ~0; +} +break; + +default: +break; +} + +r &= ~unavail; if (cpu && cpu->migratable) { r &= x86_cpu_get_migratable_flags(w); } -- 2.45.2
[PULL 10/16] target/i386: do not include undefined bits in the AMD topoext leaf
Commit d7c72735f61 ("target/i386: Add new EPYC CPU versions with updated cache_info", 2023-05-08) ensured that AMD-defined CPU models did not have the 'complex_indexing' bit set, but left it set in "-cpu host" which uses the default ("legacy") cache information. Reimplement that commit using a CPU feature, so that it can be applied to all guests using a new machine type, independent of the CPU model. Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 3 +++ hw/i386/pc.c | 1 + target/i386/cpu.c | 4 3 files changed, 8 insertions(+) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 9bea7142bf4..0d5624355e4 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2104,6 +2104,9 @@ struct ArchCPU { /* Only advertise CPUID leaves defined by the vendor */ bool vendor_cpuid_only; +/* Only advertise TOPOEXT features that AMD defines */ +bool amd_topoext_features_only; + /* Enable auto level-increase for Intel Processor Trace leave */ bool intel_pt_auto_level; diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 77415064c62..5dff91422ff 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -80,6 +80,7 @@ { "athlon-" TYPE_X86_CPU, "model-id", "QEMU Virtual CPU version " v, }, GlobalProperty pc_compat_9_0[] = { +{ TYPE_X86_CPU, "x-amd-topoext-features-only", "false" }, { TYPE_X86_CPU, "x-l1-cache-per-thread", "false" }, { TYPE_X86_CPU, "guest-phys-bits", "0" }, { "sev-guest", "legacy-vm-type", "true" }, diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 5e5bf71702c..c40551d9bfb 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6982,6 +6982,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *eax = *ebx = *ecx = *edx = 0; break; } +if (cpu->amd_topoext_features_only) { +*edx &= CACHE_NO_INVD_SHARING | CACHE_INCLUSIVE; +} break; case 0x801E: if (cpu->core_id <= 255) { @@ -8293,6 +8296,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor), DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), DEFINE_PROP_BOOL("x-vendor-cpuid-only", X86CPU, vendor_cpuid_only, true), +DEFINE_PROP_BOOL("x-amd-topoext-features-only", X86CPU, amd_topoext_features_only, true), DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_BOOL("kvm-pv-enforce-cpuid", X86CPU, kvm_pv_enforce_cpuid, -- 2.45.2
[PULL 16/16] target/i386/SEV: implement mask_cpuid_features
Drop features that are listed as "BitMask" in the PPR and currently not supported by AMD processors. The only ones that may become useful in the future are TSC deadline timer and x2APIC, everything else is not needed for SEV-SNP guests (e.g. VIRT_SSBD) or would require processor support (e.g. TSC_ADJUST). This allows running SEV-SNP guests with "-cpu host". Signed-off-by: Paolo Bonzini --- target/i386/cpu.h | 4 target/i386/sev.c | 33 + 2 files changed, 37 insertions(+) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 0d5624355e4..c43ac01c794 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -812,6 +812,8 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w); /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */ #define CPUID_7_0_EBX_FSGSBASE (1U << 0) +/* Support TSC adjust MSR */ +#define CPUID_7_0_EBX_TSC_ADJUST(1U << 1) /* Support SGX */ #define CPUID_7_0_EBX_SGX (1U << 2) /* 1st Group of Advanced Bit Manipulation Extensions */ @@ -1002,6 +1004,8 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w); #define CPUID_8000_0008_EBX_STIBP_ALWAYS_ON(1U << 17) /* Speculative Store Bypass Disable */ #define CPUID_8000_0008_EBX_AMD_SSBD(1U << 24) +/* Paravirtualized Speculative Store Bypass Disable MSR */ +#define CPUID_8000_0008_EBX_VIRT_SSBD (1U << 25) /* Predictive Store Forwarding Disable */ #define CPUID_8000_0008_EBX_AMD_PSFD(1U << 28) diff --git a/target/i386/sev.c b/target/i386/sev.c index 2f3dbe289f4..2ba5f517228 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -945,6 +945,38 @@ out: return ret; } +static uint32_t +sev_snp_mask_cpuid_features(X86ConfidentialGuest *cg, uint32_t feature, uint32_t index, +int reg, uint32_t value) +{ +switch (feature) { +case 1: +if (reg == R_ECX) { +return value & ~CPUID_EXT_TSC_DEADLINE_TIMER; +} +break; +case 7: +if (index == 0 && reg == R_EBX) { +return value & ~CPUID_7_0_EBX_TSC_ADJUST; +} +if (index == 0 && reg == R_EDX) { +return value & ~(CPUID_7_0_EDX_SPEC_CTRL | + CPUID_7_0_EDX_STIBP | + CPUID_7_0_EDX_FLUSH_L1D | + CPUID_7_0_EDX_ARCH_CAPABILITIES | + CPUID_7_0_EDX_CORE_CAPABILITY | + CPUID_7_0_EDX_SPEC_CTRL_SSBD); +} +break; +case 0x8008: +if (reg == R_EBX) { +return value & ~CPUID_8000_0008_EBX_VIRT_SSBD; +} +break; +} +return value; +} + static int sev_launch_update_data(SevCommonState *sev_common, hwaddr gpa, uint8_t *addr, size_t len) @@ -2315,6 +2347,7 @@ sev_snp_guest_class_init(ObjectClass *oc, void *data) klass->launch_finish = sev_snp_launch_finish; klass->launch_update_data = sev_snp_launch_update_data; klass->kvm_init = sev_snp_kvm_init; +x86_klass->mask_cpuid_features = sev_snp_mask_cpuid_features; x86_klass->kvm_type = sev_snp_kvm_type; object_class_property_add(oc, "policy", "uint64", -- 2.45.2
[PULL 04/16] meson: Pass objects and dependencies to declare_dependency()
From: Akihiko Odaki We used to request declare_dependency() to link_whole static libraries. If a static library is a thin archive, GNU ld keeps all object files referenced by the archive open, and sometimes exceeds the open file limit. Another problem with link_whole is that suboptimal handling of nested dependencies. link_whole by itself does not propagate dependencies. In particular, gnutls, a dependency of crypto, is not propagated to its users, and we currently workaround the issue by declaring gnutls as a dependency for each crypto user. On the other hand, if you write something like libfoo = static_library('foo', 'foo.c', dependencies: gnutls) foo = declare_dependency(link_whole: libfoo) libbar = static_library('bar', 'bar.c', dependencies: foo) bar = declare_dependency(link_whole: libbar, dependencies: foo) executable('prog', sources: files('prog.c'), dependencies: [foo, bar]) hoping to propagate the gnutls dependency into bar.c, you'll see a linking failure for "prog", because the foo.c.o object file is included in libbar.a and therefore it is linked twice into "prog": once from libfoo.a and once from libbar.a. Here Meson does not see the duplication, it just asks the linker to link all of libfoo.a and libbar.a into "prog". Instead of using link_whole, extract objects included in static libraries and pass them to declare_dependency(); and then the dependencies can be added as well so that they are propagated, because object files on the linker command line are always deduplicated. This requires Meson 1.1.0 or later. Signed-off-by: Akihiko Odaki Message-ID: <20240524-objects-v1-1-07cbbe961...@daynix.com> Signed-off-by: Paolo Bonzini --- docs/devel/build-system.rst| 3 ++- meson.build| 44 +++--- gdbstub/meson.build| 4 ++-- pythondeps.toml| 2 +- tcg/meson.build| 6 +++-- tests/qtest/libqos/meson.build | 2 +- 6 files changed, 35 insertions(+), 26 deletions(-) diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst index f4fd76117df..39a1934c63f 100644 --- a/docs/devel/build-system.rst +++ b/docs/devel/build-system.rst @@ -238,7 +238,8 @@ Subsystem sourcesets: name_suffix: 'fa', build_by_default: false) -chardev = declare_dependency(link_whole: libchardev) +chardev = declare_dependency(objects: libchardev.extract_all_objects(recursive: false), + dependencies: chardev_ss.dependencies()) As of Meson 0.55.1, the special ``.fa`` suffix should be used for everything that is used with ``link_whole``, to ensure that the link flags are placed diff --git a/meson.build b/meson.build index df9a64302f0..0c314ae5701 100644 --- a/meson.build +++ b/meson.build @@ -1,4 +1,4 @@ -project('qemu', ['c'], meson_version: '>=0.63.0', +project('qemu', ['c'], meson_version: '>=1.1.0', default_options: ['warning_level=1', 'c_std=gnu11', 'cpp_std=gnu++11', 'b_colorout=auto', 'b_staticpic=false', 'stdsplit=false', 'optimization=2', 'b_pie=true'], version: files('VERSION')) @@ -3461,7 +3461,7 @@ endif if enable_modules libmodulecommon = static_library('module-common', files('module-common.c') + genh, pic: true, c_args: '-DBUILD_DSO') - modulecommon = declare_dependency(link_whole: libmodulecommon, compile_args: '-DBUILD_DSO') + modulecommon = declare_dependency(objects: libmodulecommon.extract_all_objects(recursive: false), compile_args: '-DBUILD_DSO') endif qom_ss = qom_ss.apply({}) @@ -3469,14 +3469,15 @@ libqom = static_library('qom', qom_ss.sources() + genh, dependencies: [qom_ss.dependencies()], name_suffix: 'fa', build_by_default: false) -qom = declare_dependency(link_whole: libqom) +qom = declare_dependency(objects: libqom.extract_all_objects(recursive: false), + dependencies: qom_ss.dependencies()) event_loop_base = files('event-loop-base.c') event_loop_base = static_library('event-loop-base', sources: event_loop_base + genh, name_suffix: 'fa', build_by_default: false) -event_loop_base = declare_dependency(link_whole: event_loop_base, +event_loop_base = declare_dependency(objects: event_loop_base.extract_all_objects(recursive: false), dependencies: [qom]) stub_ss = stub_ss.apply({}) @@ -3621,7 +3622,8 @@ foreach d, list : modules endif emulator_modules += shared_module(sl.name(), name_prefix: '', -link_whole: sl, +objects: sl.extract_all_objects(recursive: false), +dependencies: module_ss.dependen
[PULL 02/16] meson: move block.syms dependency out of libblock
In order to define libqemuutil symbols that are requested by block modules, QEMU currently uses a combination of the "link_depends" argument of libraries (which is propagated into dependencies, but not available in dependencies) and the "link_args" argument of declare_dependency() (which _is_ available in static_library, but probably not used for historical reasons only). Unfortunately the link_depends will not be propagated into the "block" dependency if it is defined using declare_dependency(objects: ...); and it is not possible to add it directly to the dependency because the keyword argument simply is not available. The only solution, in order to switch to defining the dependency without using "link_whole" (which has problems of its own, see https://github.com/mesonbuild/meson/pull/8151#issuecomment-754796420), is unfortunately to add the link_args and link_depends to the executables directly; fortunately there is just four of them. It is possible (and I will look into it) to add "link_depends" to declare_dependency(), but it probably will be a while before QEMU can use it. Signed-off-by: Paolo Bonzini --- meson.build| 5 +++-- storage-daemon/meson.build | 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/meson.build b/meson.build index 8909f8c87d9..df9a64302f0 100644 --- a/meson.build +++ b/meson.build @@ -3759,12 +3759,10 @@ system_ss.add(migration) block_ss = block_ss.apply({}) libblock = static_library('block', block_ss.sources() + genh, dependencies: block_ss.dependencies(), - link_depends: block_syms, name_suffix: 'fa', build_by_default: false) block = declare_dependency(link_whole: [libblock], - link_args: '@block.syms', dependencies: [crypto, io]) blockdev_ss = blockdev_ss.apply({}) @@ -4033,10 +4031,13 @@ endif if have_tools qemu_img = executable('qemu-img', [files('qemu-img.c'), hxdep], + link_args: '@block.syms', link_depends: block_syms, dependencies: [authz, block, crypto, io, qom, qemuutil], install: true) qemu_io = executable('qemu-io', files('qemu-io.c'), + link_args: '@block.syms', link_depends: block_syms, dependencies: [block, qemuutil], install: true) qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), + link_args: '@block.syms', link_depends: block_syms, dependencies: [blockdev, qemuutil, gnutls, selinux], install: true) diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build index 46267b63e72..fd5e32f4b28 100644 --- a/storage-daemon/meson.build +++ b/storage-daemon/meson.build @@ -8,6 +8,7 @@ if have_tools qsd_ss = qsd_ss.apply({}) qsd = executable('qemu-storage-daemon', qsd_ss.sources(), + link_args: '@block.syms', link_depends: block_syms, dependencies: qsd_ss.dependencies(), install: true) endif -- 2.45.2