Re: [PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
On 09/04/2024 04.49, Shaoqin Huang wrote: The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Signed-off-by: Shaoqin Huang --- v8->v9: - Replace the warn_report to error_setg in some places. - Merge the check condition to make code more clean. - Try to use the QAPI format for the PMU Filter property but failed to use it since the -cpu option doesn't support json format yet. v7->v8: - Add qtest for kvm-pmu-filter. - Do the kvm-pmu-filter syntax checking up-front in the kvm_pmu_filter_set() function. And store the filter information at there. When kvm_pmu_filter_get() reconstitute it. v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] --- docs/system/arm/cpu-features.rst | 23 +++ target/arm/arm-qmp-cmds.c| 2 +- target/arm/cpu.h | 3 + target/arm/kvm.c | 112 +++ tests/qtest/arm-cpu-features.c | 51 ++ 5 files changed, 190 insertions(+), 1 deletion(-) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..f3930f34b3 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all PMU + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy (global ALLOW if the
Re: [PATCH v4] hw/virtio: Fix packed virtqueue flush used_idx
On 4/9/24 1:32 Eugenio Perez Martin wrote: > > External Mail: This email originated from OUTSIDE of the organization! > Do not click links, open attachments or provide ANY information unless you > recognize the sender and know the content is safe. > > > On Sun, Apr 7, 2024 at 3:56 AM Wafer wrote: > > > > Let me suggest a more generic description for the patch: > > In the event of writing many chains of descriptors, the device must write just > the id of the last buffer in the descriptor chain, skip forward the number of > descriptors in the chain, and then repeat the operations for the rest of > chains. > > Current QEMU code writes all the buffers id consecutively, and then skip all > the buffers altogether. This is a bug, and can be reproduced with a VirtIONet > device with _F_MRG_RXBUB and without _F_INDIRECT_DESC... > --- > > And then your description, particularly for VirtIONet, is totally fine. Feel > free > to make changes to the description or suggest a better wording. > > Thanks! Thank you for your suggestion. I will add your description and Suggested-by to the commit log. Thanks! > > > If a virtio-net device has the VIRTIO_NET_F_MRG_RXBUF feature but not > > the VIRTIO_RING_F_INDIRECT_DESC feature, 'VirtIONetQueue->rx_vq' will > > use the merge feature to store data in multiple 'elems'. > > The 'num_buffers' in the virtio header indicates how many elements are > merged. > > If the value of 'num_buffers' is greater than 1, all the merged > > elements will be filled into the descriptor ring. > > The 'idx' of the elements should be the value of 'vq->used_idx' plus > 'ndescs'. > > > > Fixes: 86044b24e8 ("virtio: basic packed virtqueue support") > > Acked-by: Eugenio Pérez > > Signed-off-by: Wafer > > > > --- > > Changes in v4: > > - Add Acked-by. > > > > Changes in v3: > > - Add the commit-ID of the introduced problem in commit message. > > > > Changes in v2: > > - Clarify more in commit message. > > --- > > hw/virtio/virtio.c | 12 ++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index > > fb6b4ccd83..cab5832cac 100644 > > --- a/hw/virtio/virtio.c > > +++ b/hw/virtio/virtio.c > > @@ -957,12 +957,20 @@ static void virtqueue_packed_flush(VirtQueue > *vq, unsigned int count) > > return; > > } > > > > +/* > > + * For indirect element's 'ndescs' is 1. > > + * For all other elemment's 'ndescs' is the > > + * number of descriptors chained by NEXT (as set in > virtqueue_packed_pop). > > + * So When the 'elem' be filled into the descriptor ring, > > + * The 'idx' of this 'elem' shall be > > + * the value of 'vq->used_idx' plus the 'ndescs'. > > + */ > > +ndescs += vq->used_elems[0].ndescs; > > for (i = 1; i < count; i++) { > > -virtqueue_packed_fill_desc(vq, >used_elems[i], i, false); > > +virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, > > + false); > > ndescs += vq->used_elems[i].ndescs; > > } > > virtqueue_packed_fill_desc(vq, >used_elems[0], 0, true); > > -ndescs += vq->used_elems[0].ndescs; > > > > vq->inuse -= ndescs; > > vq->used_idx += ndescs; > > -- > > 2.27.0 > >
[PATCH v2 01/28] target/i386: Add tcg/access.[ch]
Provide a method to amortize page lookup across large blocks. Signed-off-by: Richard Henderson --- target/i386/tcg/access.h| 40 + target/i386/tcg/access.c| 160 target/i386/tcg/meson.build | 1 + 3 files changed, 201 insertions(+) create mode 100644 target/i386/tcg/access.h create mode 100644 target/i386/tcg/access.c diff --git a/target/i386/tcg/access.h b/target/i386/tcg/access.h new file mode 100644 index 00..d70808a3a3 --- /dev/null +++ b/target/i386/tcg/access.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Access guest memory in blocks. */ + +#ifndef X86_TCG_ACCESS_H +#define X86_TCG_ACCESS_H + +/* An access covers at most sizeof(X86XSaveArea), at most 2 pages. */ +typedef struct X86Access { +target_ulong vaddr; +void *haddr1; +void *haddr2; +uint16_t size; +uint16_t size1; +/* + * If we can't access the host page directly, we'll have to do I/O access + * via ld/st helpers. These are internal details, so we store the rest + * to do the access here instead of passing it around in the helpers. + */ +int mmu_idx; +CPUX86State *env; +uintptr_t ra; +} X86Access; + +void access_prepare_mmu(X86Access *ret, CPUX86State *env, +vaddr vaddr, unsigned size, +MMUAccessType type, int mmu_idx, uintptr_t ra); +void access_prepare(X86Access *ret, CPUX86State *env, vaddr vaddr, +unsigned size, MMUAccessType type, uintptr_t ra); + +uint8_t access_ldb(X86Access *ac, vaddr addr); +uint16_t access_ldw(X86Access *ac, vaddr addr); +uint32_t access_ldl(X86Access *ac, vaddr addr); +uint64_t access_ldq(X86Access *ac, vaddr addr); + +void access_stb(X86Access *ac, vaddr addr, uint8_t val); +void access_stw(X86Access *ac, vaddr addr, uint16_t val); +void access_stl(X86Access *ac, vaddr addr, uint32_t val); +void access_stq(X86Access *ac, vaddr addr, uint64_t val); + +#endif diff --git a/target/i386/tcg/access.c b/target/i386/tcg/access.c new file mode 100644 index 00..8b70f3244b --- /dev/null +++ b/target/i386/tcg/access.c @@ -0,0 +1,160 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Access guest memory in blocks. */ + +#include "qemu/osdep.h" +#include "cpu.h" +#include "exec/cpu_ldst.h" +#include "exec/exec-all.h" +#include "access.h" + + +void access_prepare_mmu(X86Access *ret, CPUX86State *env, +vaddr vaddr, unsigned size, +MMUAccessType type, int mmu_idx, uintptr_t ra) +{ +int size1, size2; +void *haddr1, *haddr2; + +assert(size > 0 && size <= TARGET_PAGE_SIZE); + +size1 = MIN(size, -(vaddr | TARGET_PAGE_MASK)), +size2 = size - size1; + +memset(ret, 0, sizeof(*ret)); +ret->vaddr = vaddr; +ret->size = size; +ret->size1 = size1; +ret->mmu_idx = mmu_idx; +ret->env = env; +ret->ra = ra; + +haddr1 = probe_access(env, vaddr, size1, type, mmu_idx, ra); +ret->haddr1 = haddr1; + +if (unlikely(size2)) { +haddr2 = probe_access(env, vaddr + size1, size2, type, mmu_idx, ra); +if (haddr2 == haddr1 + size1) { +ret->size1 = size; +} else { +ret->haddr2 = haddr2; +} +} +} + +void access_prepare(X86Access *ret, CPUX86State *env, vaddr vaddr, +unsigned size, MMUAccessType type, uintptr_t ra) +{ +int mmu_idx = cpu_mmu_index(env_cpu(env), false); +access_prepare_mmu(ret, env, vaddr, size, type, mmu_idx, ra); +} + +static void *access_ptr(X86Access *ac, vaddr addr, unsigned len) +{ +vaddr offset = addr - ac->vaddr; + +assert(addr >= ac->vaddr); + +#ifdef CONFIG_USER_ONLY +assert(offset <= ac->size1 - len); +return ac->haddr1 + offset; +#else +if (likely(offset <= ac->size1 - len)) { +return ac->haddr1; +} +assert(offset <= ac->size - len); +if (likely(offset >= ac->size1)) { +return ac->haddr2; +} +return NULL; +#endif +} + +#ifdef CONFIG_USER_ONLY +# define test_ptr(p) true +#else +# define test_ptr(p) likely(p) +#endif + +uint8_t access_ldb(X86Access *ac, vaddr addr) +{ +void *p = access_ptr(ac, addr, sizeof(uint8_t)); + +if (test_ptr(p)) { +return ldub_p(p); +} +return cpu_ldub_mmuidx_ra(ac->env, addr, ac->mmu_idx, ac->ra); +} + +uint16_t access_ldw(X86Access *ac, vaddr addr) +{ +void *p = access_ptr(ac, addr, sizeof(uint16_t)); + +if (test_ptr(p)) { +return lduw_le_p(p); +} +return cpu_lduw_le_mmuidx_ra(ac->env, addr, ac->mmu_idx, ac->ra); +} + +uint32_t access_ldl(X86Access *ac, vaddr addr) +{ +void *p = access_ptr(ac, addr, sizeof(uint32_t)); + +if (test_ptr(p)) { +return ldl_le_p(p); +} +return cpu_ldl_le_mmuidx_ra(ac->env, addr, ac->mmu_idx, ac->ra); +} + +uint64_t access_ldq(X86Access *ac, vaddr addr) +{ +void *p = access_ptr(ac, addr, sizeof(uint64_t)); + +if (test_ptr(p)) { +
[PATCH v2 17/28] linux-user/i386: Replace target_fpstate_fxsave with X86LegacyXSaveArea
Use the structure definition from target/i386/cpu.h. The only minor quirk is re-casting the sw_reserved area to the OS specific struct target_fpx_sw_bytes. Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 71 +++- 1 file changed, 26 insertions(+), 45 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index a4748b743d..ed98b4d073 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -33,16 +33,6 @@ struct target_fpreg { uint16_t exponent; }; -struct target_fpxreg { -uint16_t significand[4]; -uint16_t exponent; -uint16_t padding[3]; -}; - -struct target_xmmreg { -uint32_t element[4]; -}; - struct target_fpx_sw_bytes { uint32_t magic1; uint32_t extended_size; @@ -52,25 +42,6 @@ struct target_fpx_sw_bytes { }; QEMU_BUILD_BUG_ON(sizeof(struct target_fpx_sw_bytes) != 12*4); -struct target_fpstate_fxsave { -/* FXSAVE format */ -uint16_t cw; -uint16_t sw; -uint16_t twd; -uint16_t fop; -uint64_t rip; -uint64_t rdp; -uint32_t mxcsr; -uint32_t mxcsr_mask; -uint32_t st_space[32]; -uint32_t xmm_space[64]; -uint32_t hw_reserved[12]; -struct target_fpx_sw_bytes sw_reserved; -}; -#define TARGET_FXSAVE_SIZE sizeof(struct target_fpstate_fxsave) -QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512); -QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_fxsave, sw_reserved) != 464); - struct target_fpstate_32 { /* Regular FPU environment */ uint32_t cw; @@ -83,7 +54,7 @@ struct target_fpstate_32 { struct target_fpreg st[8]; uint16_t status; uint16_t magic; /* 0x = regular FPU data only */ -struct target_fpstate_fxsave fxsave; +X86LegacyXSaveArea fxsave; }; /* @@ -96,7 +67,7 @@ QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxsave) & 15); # define target_fpstate target_fpstate_32 # define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, fxsave) #else -# define target_fpstate target_fpstate_fxsave +# define target_fpstate X86LegacyXSaveArea # define TARGET_FPSTATE_FXSAVE_OFFSET 0 #endif @@ -240,15 +211,17 @@ struct rt_sigframe { * Set up a signal frame. */ -static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxsave, +static void xsave_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, abi_ulong fxsave_addr) { +struct target_fpx_sw_bytes *sw = (void *)>sw_reserved; + if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) { /* fxsave_addr must be 16 byte aligned for fxsave */ assert(!(fxsave_addr & 0xf)); cpu_x86_fxsave(env, fxsave_addr); -__put_user(0, >sw_reserved.magic1); +__put_user(0, >magic1); } else { uint32_t xstate_size = xsave_area_size(env->xcr0, false); @@ -266,10 +239,10 @@ static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs /* Zero the header, XSAVE *adds* features to an existing save state. */ memset(fxsave + 1, 0, sizeof(X86XSaveHeader)); cpu_x86_xsave(env, fxsave_addr, -1); -__put_user(TARGET_FP_XSTATE_MAGIC1, >sw_reserved.magic1); -__put_user(extended_size, >sw_reserved.extended_size); -__put_user(env->xcr0, >sw_reserved.xfeatures); -__put_user(xstate_size, >sw_reserved.xstate_size); +__put_user(TARGET_FP_XSTATE_MAGIC1, >magic1); +__put_user(extended_size, >extended_size); +__put_user(env->xcr0, >xfeatures); +__put_user(xstate_size, >xstate_size); __put_user(TARGET_FP_XSTATE_MAGIC2, (uint32_t *)((void *)fxsave + xstate_size)); } @@ -383,9 +356,9 @@ get_sigframe(struct target_sigaction *ka, CPUX86State *env, size_t fxsave_offset } if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { -return (esp - (fxsave_offset + TARGET_FXSAVE_SIZE)) & -8ul; +return (esp - (fxsave_offset + sizeof(X86LegacyXSaveArea))) & -8ul; } else if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) { -return ((esp - TARGET_FXSAVE_SIZE) & -16ul) - fxsave_offset; +return ((esp - sizeof(X86LegacyXSaveArea)) & -16ul) - fxsave_offset; } else { size_t xstate_size = xsave_area_size(env->xcr0, false) + TARGET_FP_XSTATE_MAGIC2_SIZE; @@ -551,21 +524,29 @@ give_sigsegv: force_sigsegv(sig); } -static int xrstor_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxsave, +static int xrstor_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, abi_ulong fxsave_addr) { +struct target_fpx_sw_bytes *sw = (void *)>sw_reserved; + if (env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE) { -uint32_t extended_size = tswapl(fxsave->sw_reserved.extended_size); -uint32_t xstate_size = tswapl(fxsave->sw_reserved.xstate_size); +uint32_t magic1 = tswapl(sw->magic1); +
[PATCH v2 18/28] linux-user/i386: Split out struct target_fregs_state
Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 43 +++- 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index ed98b4d073..559b63c25b 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -33,6 +33,23 @@ struct target_fpreg { uint16_t exponent; }; +/* Legacy x87 fpu state format for FSAVE/FRESTOR. */ +struct target_fregs_state { +uint32_t cwd; +uint32_t swd; +uint32_t twd; +uint32_t fip; +uint32_t fcs; +uint32_t foo; +uint32_t fos; +struct target_fpreg st[8]; + +/* Software status information [not touched by FSAVE]. */ +uint16_t status; +uint16_t magic; /* 0x: FPU data only, 0x: FXSR FPU data */ +}; +QEMU_BUILD_BUG_ON(sizeof(struct target_fregs_state) != 32 + 80); + struct target_fpx_sw_bytes { uint32_t magic1; uint32_t extended_size; @@ -43,29 +60,19 @@ struct target_fpx_sw_bytes { QEMU_BUILD_BUG_ON(sizeof(struct target_fpx_sw_bytes) != 12*4); struct target_fpstate_32 { -/* Regular FPU environment */ -uint32_t cw; -uint32_t sw; -uint32_t tag; -uint32_t ipoff; -uint32_t cssel; -uint32_t dataoff; -uint32_t datasel; -struct target_fpreg st[8]; -uint16_t status; -uint16_t magic; /* 0x = regular FPU data only */ -X86LegacyXSaveArea fxsave; +struct target_fregs_state fpstate; +X86LegacyXSaveArea fxstate; }; /* * For simplicity, setup_frame aligns struct target_fpstate_32 to * 16 bytes, so ensure that the FXSAVE area is also aligned. */ -QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxsave) & 15); +QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxstate) & 15); #ifndef TARGET_X86_64 # define target_fpstate target_fpstate_32 -# define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, fxsave) +# define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, fxstate) #else # define target_fpstate X86LegacyXSaveArea # define TARGET_FPSTATE_FXSAVE_OFFSET 0 @@ -278,15 +285,15 @@ static void setup_sigcontext(struct target_sigcontext *sc, __put_user(env->segs[R_SS].selector, (unsigned int *)>ss); cpu_x86_fsave(env, fpstate_addr, 1); -fpstate->status = fpstate->sw; +fpstate->fpstate.status = fpstate->fpstate.swd; if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { magic = 0x; } else { -xsave_sigcontext(env, >fxsave, +xsave_sigcontext(env, >fxstate, fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); magic = 0; } -__put_user(magic, >magic); +__put_user(magic, >fpstate.magic); #else __put_user(env->regs[R_EDI], >rdi); __put_user(env->regs[R_ESI], >rsi); @@ -622,7 +629,7 @@ restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) cpu_x86_frstor(env, fpstate_addr, 1); err = 0; } else { -err = xrstor_sigcontext(env, >fxsave, +err = xrstor_sigcontext(env, >fxstate, fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); } #else -- 2.34.1
[PATCH v2 20/28] linux-user/i386: Return boolean success from restore_sigcontext
Invert the sense of the return value and use bool. Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 51 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index f8cc0cff07..1571ff8553 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -563,12 +563,12 @@ static int xrstor_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, return 0; } -static int -restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) +static bool restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) { -int err = 1; abi_ulong fpstate_addr; unsigned int tmpflags; +struct target_fpstate *fpstate; +bool ok; #ifndef TARGET_X86_64 cpu_x86_load_seg(env, R_GS, tswap16(sc->gs)); @@ -616,29 +616,27 @@ restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) // regs->orig_eax = -1;/* disable syscall checks */ fpstate_addr = tswapl(sc->fpstate); -if (fpstate_addr != 0) { -struct target_fpstate *fpstate; -if (!lock_user_struct(VERIFY_READ, fpstate, fpstate_addr, - sizeof(struct target_fpstate))) { -return err; -} -#ifndef TARGET_X86_64 -if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { -cpu_x86_frstor(env, fpstate_addr, 1); -err = 0; -} else { -err = xrstor_sigcontext(env, >fxstate, -fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); -} -#else -err = xrstor_sigcontext(env, fpstate, fpstate_addr); -#endif -unlock_user_struct(fpstate, fpstate_addr, 0); -} else { -err = 0; +if (fpstate_addr == 0) { +return true; } +if (!lock_user_struct(VERIFY_READ, fpstate, fpstate_addr, + sizeof(struct target_fpstate))) { +return false; +} +#ifndef TARGET_X86_64 +if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { +cpu_x86_frstor(env, fpstate_addr, 1); +ok = true; +} else { +ok = !xrstor_sigcontext(env, >fxstate, +fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); +} +#else +ok = !xrstor_sigcontext(env, fpstate, fpstate_addr); +#endif +unlock_user_struct(fpstate, fpstate_addr, 0); -return err; +return ok; } /* Note: there is no sigreturn on x86_64, there is only rt_sigreturn */ @@ -664,8 +662,9 @@ long do_sigreturn(CPUX86State *env) set_sigmask(); /* restore registers */ -if (restore_sigcontext(env, >sc)) +if (!restore_sigcontext(env, >sc)) { goto badframe; +} unlock_user_struct(frame, frame_addr, 0); return -QEMU_ESIGRETURN; @@ -689,7 +688,7 @@ long do_rt_sigreturn(CPUX86State *env) target_to_host_sigset(, >uc.tuc_sigmask); set_sigmask(); -if (restore_sigcontext(env, >uc.tuc_mcontext)) { +if (!restore_sigcontext(env, >uc.tuc_mcontext)) { goto badframe; } -- 2.34.1
[PATCH v2 21/28] linux-user/i386: Return boolean success from xrstor_sigcontext
Invert the sense of the return value and use bool. Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 1571ff8553..d600a4355b 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -529,8 +529,8 @@ give_sigsegv: force_sigsegv(sig); } -static int xrstor_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, - abi_ulong fxsave_addr) +static bool xrstor_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, + abi_ulong fxsave_addr) { struct target_fpx_sw_bytes *sw = (void *)>sw_reserved; @@ -548,19 +548,19 @@ static int xrstor_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxsave, && extended_size >= minimum_size) { if (!access_ok(env_cpu(env), VERIFY_READ, fxsave_addr, extended_size - TARGET_FPSTATE_FXSAVE_OFFSET)) { -return 1; +return false; } magic2 = tswapl(*(uint32_t *)((void *)fxsave + xstate_size)); if (magic2 == TARGET_FP_XSTATE_MAGIC2) { cpu_x86_xrstor(env, fxsave_addr, -1); -return 0; +return true; } } /* fall through to fxrstor */ } cpu_x86_fxrstor(env, fxsave_addr); -return 0; +return true; } static bool restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) @@ -628,11 +628,11 @@ static bool restore_sigcontext(CPUX86State *env, struct target_sigcontext *sc) cpu_x86_frstor(env, fpstate_addr, 1); ok = true; } else { -ok = !xrstor_sigcontext(env, >fxstate, -fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); +ok = xrstor_sigcontext(env, >fxstate, + fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); } #else -ok = !xrstor_sigcontext(env, fpstate, fpstate_addr); +ok = xrstor_sigcontext(env, fpstate, fpstate_addr); #endif unlock_user_struct(fpstate, fpstate_addr, 0); -- 2.34.1
[PATCH v2 27/28] target/i386: Pass host pointer and size to cpu_x86_{fxsave, fxrstor}
We have already validated the memory region in the course of validating the signal frame. No need to do it again within the helper function. Signed-off-by: Richard Henderson --- target/i386/cpu.h| 4 ++-- linux-user/i386/signal.c | 13 + target/i386/tcg/fpu_helper.c | 26 -- 3 files changed, 23 insertions(+), 20 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 8eb97fdd7a..35a8bf831f 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2234,8 +2234,8 @@ int cpu_x86_get_descr_debug(CPUX86State *env, unsigned int selector, void cpu_x86_load_seg(CPUX86State *s, X86Seg seg_reg, int selector); void cpu_x86_fsave(CPUX86State *s, void *host, size_t len); void cpu_x86_frstor(CPUX86State *s, void *host, size_t len); -void cpu_x86_fxsave(CPUX86State *s, target_ulong ptr); -void cpu_x86_fxrstor(CPUX86State *s, target_ulong ptr); +void cpu_x86_fxsave(CPUX86State *s, void *host, size_t len); +void cpu_x86_fxrstor(CPUX86State *s, void *host, size_t len); void cpu_x86_xsave(CPUX86State *s, target_ulong ptr, uint64_t rbfm); void cpu_x86_xrstor(CPUX86State *s, target_ulong ptr, uint64_t rbfm); diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 7178440d67..b823dee17f 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -293,14 +293,11 @@ static abi_ptr get_sigframe(struct target_sigaction *ka, CPUX86State *env, * Set up a signal frame. */ -static void fxsave_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxstate, - abi_ptr fxstate_addr) +static void fxsave_sigcontext(CPUX86State *env, X86LegacyXSaveArea *fxstate) { struct target_fpx_sw_bytes *sw = (void *)>sw_reserved; -/* fxstate_addr must be 16 byte aligned for fxsave */ -assert(!(fxstate_addr & 0xf)); -cpu_x86_fxsave(env, fxstate_addr); +cpu_x86_fxsave(env, fxstate, sizeof(*fxstate)); __put_user(0, >magic1); } @@ -411,7 +408,7 @@ static void setup_sigcontext(CPUX86State *env, xsave_sigcontext(env, fxstate, fpstate_addr, fxstate_addr, fpend_addr); break; case FPSTATE_FXSAVE: -fxsave_sigcontext(env, fxstate, fxstate_addr); +fxsave_sigcontext(env, fxstate); break; default: break; @@ -668,7 +665,7 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, break; } -cpu_x86_fxrstor(env, fxstate_addr); +cpu_x86_fxrstor(env, fxstate, sizeof(*fxstate)); return true; } @@ -686,7 +683,7 @@ static bool frstor_sigcontext(CPUX86State *env, FPStateKind fpkind, } break; case FPSTATE_FXSAVE: -cpu_x86_fxrstor(env, fxstate_addr); +cpu_x86_fxrstor(env, fxstate, sizeof(*fxstate)); break; case FPSTATE_FSAVE: break; diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 0a91757690..1c2121c559 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3040,22 +3040,28 @@ void cpu_x86_frstor(CPUX86State *env, void *host, size_t len) do_frstor(, 0, true); } -void cpu_x86_fxsave(CPUX86State *env, target_ulong ptr) +void cpu_x86_fxsave(CPUX86State *env, void *host, size_t len) { -X86Access ac; +X86Access ac = { +.haddr1 = host, +.size = sizeof(X86LegacyXSaveArea), +.env = env, +}; -access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), - MMU_DATA_STORE, 0); -do_fxsave(, ptr); +assert(ac.size <= len); +do_fxsave(, 0); } -void cpu_x86_fxrstor(CPUX86State *env, target_ulong ptr) +void cpu_x86_fxrstor(CPUX86State *env, void *host, size_t len) { -X86Access ac; +X86Access ac = { +.haddr1 = host, +.size = sizeof(X86LegacyXSaveArea), +.env = env, +}; -access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), - MMU_DATA_LOAD, 0); -do_fxrstor(, ptr); +assert(ac.size <= len); +do_fxrstor(, 0); } void cpu_x86_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) -- 2.34.1
[PATCH v2 23/28] target/i386: Honor xfeatures in xrstor_sigcontext
Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index d015fe520a..fd09c973d4 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -612,6 +612,7 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, struct target_fpx_sw_bytes *sw = (void *)>sw_reserved; uint32_t magic1, magic2; uint32_t extended_size, xstate_size, min_size, max_size; +uint64_t xfeatures; switch (fpkind) { case FPSTATE_XSAVE: @@ -628,10 +629,25 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, xstate_size > extended_size) { break; } + +/* + * Restore the features indicated in the frame, masked by + * those currently enabled. Re-check the frame size. + * ??? It is not clear where the kernel does this, but it + * is not in check_xstate_in_sigframe, and so (probably) + * does not fall back to fxrstor. + */ +xfeatures = tswap64(sw->xfeatures) & env->xcr0; +min_size = xsave_area_size(xfeatures, false); +if (xstate_size < min_size) { +return false; +} + if (!access_ok(env_cpu(env), VERIFY_READ, fxstate_addr, xstate_size + TARGET_FP_XSTATE_MAGIC2_SIZE)) { return false; } + /* * Check for the presence of second magic word at the end of memory * layout. This detects the case where the user just copied the legacy @@ -644,7 +660,8 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, if (magic2 != FP_XSTATE_MAGIC2) { break; } -cpu_x86_xrstor(env, fxstate_addr, -1); + +cpu_x86_xrstor(env, fxstate_addr, xfeatures); return true; default: -- 2.34.1
[PATCH v2 16/28] linux-user/i386: Remove xfeatures from target_fpstate_fxsave
This is easily computed by advancing past the structure. At the same time, replace the magic number "64". Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 547c7cc685..a4748b743d 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -66,7 +66,6 @@ struct target_fpstate_fxsave { uint32_t xmm_space[64]; uint32_t hw_reserved[12]; struct target_fpx_sw_bytes sw_reserved; -uint8_t xfeatures[]; }; #define TARGET_FXSAVE_SIZE sizeof(struct target_fpstate_fxsave) QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512); @@ -265,7 +264,7 @@ static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs assert(!(fxsave_addr & 0x3f)); /* Zero the header, XSAVE *adds* features to an existing save state. */ -memset(fxsave->xfeatures, 0, 64); +memset(fxsave + 1, 0, sizeof(X86XSaveHeader)); cpu_x86_xsave(env, fxsave_addr, -1); __put_user(TARGET_FP_XSTATE_MAGIC1, >sw_reserved.magic1); __put_user(extended_size, >sw_reserved.extended_size); -- 2.34.1
[PATCH v2 12/28] target/i386: Split out do_xsave_chk
This path is not required by user-only, and can in fact be shared between xsave and xrstor. Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 51 +++- 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 883002dc22..11c60152de 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2674,16 +2674,6 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, X86Access ac; unsigned size; -/* The OS must have enabled XSAVE. */ -if (!(env->cr[4] & CR4_OSXSAVE_MASK)) { -raise_exception_ra(env, EXCP06_ILLOP, ra); -} - -/* The operand must be 64 byte aligned. */ -if (ptr & 63) { -raise_exception_ra(env, EXCP0D_GPF, ra); -} - /* Never save anything not enabled by XCR0. */ rfbm &= env->xcr0; opt &= rfbm; @@ -2720,15 +2710,35 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, access_stq(, ptr + XO(header.xstate_bv), new_bv); } +static void do_xsave_chk(CPUX86State *env, target_ulong ptr, uintptr_t ra) +{ +/* The OS must have enabled XSAVE. */ +if (!(env->cr[4] & CR4_OSXSAVE_MASK)) { +raise_exception_ra(env, EXCP06_ILLOP, ra); +} + +/* The operand must be 64 byte aligned. */ +if (ptr & 63) { +raise_exception_ra(env, EXCP0D_GPF, ra); +} +} + void helper_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -do_xsave(env, ptr, rfbm, get_xinuse(env), -1, GETPC()); +uintptr_t ra = GETPC(); + +do_xsave_chk(env, ptr, ra); +do_xsave(env, ptr, rfbm, get_xinuse(env), -1, ra); } void helper_xsaveopt(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -uint64_t inuse = get_xinuse(env); -do_xsave(env, ptr, rfbm, inuse, inuse, GETPC()); +uintptr_t ra = GETPC(); +uint64_t inuse; + +do_xsave_chk(env, ptr, ra); +inuse = get_xinuse(env); +do_xsave(env, ptr, rfbm, inuse, inuse, ra); } static void do_xrstor_fpu(X86Access *ac, target_ulong ptr) @@ -2899,16 +2909,6 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr rfbm &= env->xcr0; -/* The OS must have enabled XSAVE. */ -if (!(env->cr[4] & CR4_OSXSAVE_MASK)) { -raise_exception_ra(env, EXCP06_ILLOP, ra); -} - -/* The operand must be 64 byte aligned. */ -if (ptr & 63) { -raise_exception_ra(env, EXCP0D_GPF, ra); -} - size = sizeof(X86LegacyXSaveArea) + sizeof(X86XSaveHeader); access_prepare(, env, ptr, size, MMU_DATA_LOAD, ra); @@ -3003,7 +3003,10 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr void helper_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -do_xrstor(env, ptr, rfbm, GETPC()); +uintptr_t ra = GETPC(); + +do_xsave_chk(env, ptr, ra); +do_xrstor(env, ptr, rfbm, ra); } #if defined(CONFIG_USER_ONLY) -- 2.34.1
[PATCH v2 03/28] target/i386: Convert helper_{fbld, fbst}_ST0 to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 878fad9795..ad8b536cb5 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -772,18 +772,21 @@ void helper_fninit(CPUX86State *env) void helper_fbld_ST0(CPUX86State *env, target_ulong ptr) { +X86Access ac; floatx80 tmp; uint64_t val; unsigned int v; int i; +access_prepare(, env, ptr, 10, MMU_DATA_LOAD, GETPC()); + val = 0; for (i = 8; i >= 0; i--) { -v = cpu_ldub_data_ra(env, ptr + i, GETPC()); +v = access_ldb(, ptr + i); val = (val * 100) + ((v >> 4) * 10) + (v & 0xf); } tmp = int64_to_floatx80(val, >fp_status); -if (cpu_ldub_data_ra(env, ptr + 9, GETPC()) & 0x80) { +if (access_ldb(, ptr + 9) & 0x80) { tmp = floatx80_chs(tmp); } fpush(env); @@ -797,7 +800,9 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr) target_ulong mem_ref, mem_end; int64_t val; CPU_LDoubleU temp; +X86Access ac; +access_prepare(, env, ptr, 10, MMU_DATA_STORE, GETPC()); temp.d = ST0; val = floatx80_to_int64(ST0, >fp_status); @@ -805,20 +810,20 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr) if (val >= 100LL || val <= -100LL) { set_float_exception_flags(float_flag_invalid, >fp_status); while (mem_ref < ptr + 7) { -cpu_stb_data_ra(env, mem_ref++, 0, GETPC()); +access_stb(, mem_ref++, 0); } -cpu_stb_data_ra(env, mem_ref++, 0xc0, GETPC()); -cpu_stb_data_ra(env, mem_ref++, 0xff, GETPC()); -cpu_stb_data_ra(env, mem_ref++, 0xff, GETPC()); +access_stb(, mem_ref++, 0xc0); +access_stb(, mem_ref++, 0xff); +access_stb(, mem_ref++, 0xff); merge_exception_flags(env, old_flags); return; } mem_end = mem_ref + 9; if (SIGND(temp)) { -cpu_stb_data_ra(env, mem_end, 0x80, GETPC()); +access_stb(, mem_end, 0x80); val = -val; } else { -cpu_stb_data_ra(env, mem_end, 0x00, GETPC()); +access_stb(, mem_end, 0x00); } while (mem_ref < mem_end) { if (val == 0) { @@ -827,10 +832,10 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr) v = val % 100; val = val / 100; v = ((v / 10) << 4) | (v % 10); -cpu_stb_data_ra(env, mem_ref++, v, GETPC()); +access_stb(, mem_ref++, v); } while (mem_ref < mem_end) { -cpu_stb_data_ra(env, mem_ref++, 0, GETPC()); +access_stb(, mem_ref++, 0); } merge_exception_flags(env, old_flags); } -- 2.34.1
[PATCH v2 08/28] target/i386: Convert do_xrstor_{fpu, mxcr, sse} to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 46 ++-- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 643e017bef..59f73ad075 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2724,39 +2724,41 @@ void helper_xsaveopt(CPUX86State *env, target_ulong ptr, uint64_t rfbm) do_xsave(env, ptr, rfbm, inuse, inuse, GETPC()); } -static void do_xrstor_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_fpu(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int i, fpuc, fpus, fptag; target_ulong addr; -X86Access ac; -fpuc = cpu_lduw_data_ra(env, ptr + XO(legacy.fcw), ra); -fpus = cpu_lduw_data_ra(env, ptr + XO(legacy.fsw), ra); -fptag = cpu_lduw_data_ra(env, ptr + XO(legacy.ftw), ra); +fpuc = access_ldw(ac, ptr + XO(legacy.fcw)); +fpus = access_ldw(ac, ptr + XO(legacy.fsw)); +fptag = access_ldw(ac, ptr + XO(legacy.ftw)); cpu_set_fpuc(env, fpuc); cpu_set_fpus(env, fpus); + fptag ^= 0xff; for (i = 0; i < 8; i++) { env->fptags[i] = ((fptag >> i) & 1); } addr = ptr + XO(legacy.fpregs); -access_prepare(, env, addr, 8 * 16, MMU_DATA_LOAD, GETPC()); for (i = 0; i < 8; i++) { -floatx80 tmp = do_fldt(, addr); +floatx80 tmp = do_fldt(ac, addr); ST(i) = tmp; addr += 16; } } -static void do_xrstor_mxcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_mxcsr(X86Access *ac, target_ulong ptr) { -cpu_set_mxcsr(env, cpu_ldl_data_ra(env, ptr + XO(legacy.mxcsr), ra)); +CPUX86State *env = ac->env; +cpu_set_mxcsr(env, access_ldl(ac, ptr + XO(legacy.mxcsr))); } -static void do_xrstor_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_sse(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int i, nb_xmm_regs; target_ulong addr; @@ -2768,8 +2770,8 @@ static void do_xrstor_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) addr = ptr + XO(legacy.xmm_regs); for (i = 0; i < nb_xmm_regs; i++) { -env->xmm_regs[i].ZMM_Q(0) = cpu_ldq_data_ra(env, addr, ra); -env->xmm_regs[i].ZMM_Q(1) = cpu_ldq_data_ra(env, addr + 8, ra); +env->xmm_regs[i].ZMM_Q(0) = access_ldq(ac, addr); +env->xmm_regs[i].ZMM_Q(1) = access_ldq(ac, addr + 8); addr += 16; } } @@ -2849,20 +2851,24 @@ static void do_xrstor_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) static void do_fxrstor(CPUX86State *env, target_ulong ptr, uintptr_t ra) { +X86Access ac; + /* The operand must be 16 byte aligned */ if (ptr & 0xf) { raise_exception_ra(env, EXCP0D_GPF, ra); } -do_xrstor_fpu(env, ptr, ra); +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_LOAD, ra); +do_xrstor_fpu(, ptr); if (env->cr[4] & CR4_OSFXSR_MASK) { -do_xrstor_mxcsr(env, ptr, ra); +do_xrstor_mxcsr(, ptr); /* Fast FXRSTOR leaves out the XMM registers */ if (!(env->efer & MSR_EFER_FFXSR) || (env->hflags & HF_CPL_MASK) || !(env->hflags & HF_LMA_MASK)) { -do_xrstor_sse(env, ptr, ra); +do_xrstor_sse(, ptr); } } } @@ -2875,6 +2881,7 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr) static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr_t ra) { uint64_t xstate_bv, xcomp_bv, reserve0; +X86Access ac; rfbm &= env->xcr0; @@ -2913,9 +2920,12 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr raise_exception_ra(env, EXCP0D_GPF, ra); } +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_LOAD, ra); + if (rfbm & XSTATE_FP_MASK) { if (xstate_bv & XSTATE_FP_MASK) { -do_xrstor_fpu(env, ptr, ra); +do_xrstor_fpu(, ptr); } else { do_fninit(env); memset(env->fpregs, 0, sizeof(env->fpregs)); @@ -2924,9 +2934,9 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr if (rfbm & XSTATE_SSE_MASK) { /* Note that the standard form of XRSTOR loads MXCSR from memory whether or not the XSTATE_BV bit is set. */ -do_xrstor_mxcsr(env, ptr, ra); +do_xrstor_mxcsr(, ptr); if (xstate_bv & XSTATE_SSE_MASK) { -do_xrstor_sse(env, ptr, ra); +do_xrstor_sse(, ptr); } else { do_clear_sse(env); } -- 2.34.1
[PATCH v2 13/28] target/i386: Add rbfm argument to cpu_x86_{xsave, xrstor}
For now, continue to pass all 1's from signal.c. Signed-off-by: Richard Henderson --- target/i386/cpu.h| 4 ++-- linux-user/i386/signal.c | 4 ++-- target/i386/tcg/fpu_helper.c | 8 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 6b05738079..5860acb0c3 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2223,8 +2223,8 @@ void cpu_x86_fsave(CPUX86State *s, target_ulong ptr, int data32); void cpu_x86_frstor(CPUX86State *s, target_ulong ptr, int data32); void cpu_x86_fxsave(CPUX86State *s, target_ulong ptr); void cpu_x86_fxrstor(CPUX86State *s, target_ulong ptr); -void cpu_x86_xsave(CPUX86State *s, target_ulong ptr); -void cpu_x86_xrstor(CPUX86State *s, target_ulong ptr); +void cpu_x86_xsave(CPUX86State *s, target_ulong ptr, uint64_t rbfm); +void cpu_x86_xrstor(CPUX86State *s, target_ulong ptr, uint64_t rbfm); /* cpu.c */ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index cfe70fc5cf..68659fa1db 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -267,7 +267,7 @@ static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs /* Zero the header, XSAVE *adds* features to an existing save state. */ memset(fxsave->xfeatures, 0, 64); -cpu_x86_xsave(env, fxsave_addr); +cpu_x86_xsave(env, fxsave_addr, -1); __put_user(TARGET_FP_XSTATE_MAGIC1, >sw_reserved.magic1); __put_user(extended_size, >sw_reserved.extended_size); __put_user(env->xcr0, >sw_reserved.xfeatures); @@ -568,7 +568,7 @@ static int xrstor_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs return 1; } if (tswapl(*(uint32_t *) >xfeatures[xfeatures_size]) == TARGET_FP_XSTATE_MAGIC2) { -cpu_x86_xrstor(env, fxsave_addr); +cpu_x86_xrstor(env, fxsave_addr, -1); return 0; } } diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 11c60152de..dbc1e5d8dd 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3046,14 +3046,14 @@ void cpu_x86_fxrstor(CPUX86State *env, target_ulong ptr) do_fxrstor(, ptr); } -void cpu_x86_xsave(CPUX86State *env, target_ulong ptr) +void cpu_x86_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -do_xsave(env, ptr, -1, get_xinuse(env), -1, 0); +do_xsave(env, ptr, rfbm, get_xinuse(env), -1, 0); } -void cpu_x86_xrstor(CPUX86State *env, target_ulong ptr) +void cpu_x86_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -do_xrstor(env, ptr, -1, 0); +do_xrstor(env, ptr, rfbm, 0); } #endif -- 2.34.1
[PATCH v2 25/28] target/i386: Convert do_xrstor to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 106 +-- 1 file changed, 64 insertions(+), 42 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index d4dd09dc95..909da05f91 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2902,51 +2902,38 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr) do_fxrstor(, ptr); } -static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr_t ra) +static bool valid_xrstor_header(X86Access *ac, uint64_t *pxsbv, +target_ulong ptr) { uint64_t xstate_bv, xcomp_bv, reserve0; -X86Access ac; -unsigned size, size_ext; -rfbm &= env->xcr0; +xstate_bv = access_ldq(ac, ptr + XO(header.xstate_bv)); +xcomp_bv = access_ldq(ac, ptr + XO(header.xcomp_bv)); +reserve0 = access_ldq(ac, ptr + XO(header.reserve0)); +*pxsbv = xstate_bv; -size = sizeof(X86LegacyXSaveArea) + sizeof(X86XSaveHeader); -access_prepare(, env, ptr, size, MMU_DATA_LOAD, ra); - -xstate_bv = access_ldq(, ptr + XO(header.xstate_bv)); - -if ((int64_t)xstate_bv < 0) { -/* FIXME: Compact form. */ -raise_exception_ra(env, EXCP0D_GPF, ra); +/* + * XCOMP_BV bit 63 indicates compact form, which we do not support, + * and thus must raise #GP. That leaves us in standard form. + * In standard form, bytes 23:8 must be zero -- which is both + * XCOMP_BV and the following 64-bit field. + */ +if (xcomp_bv || reserve0) { +return false; } -/* Standard form. */ - /* The XSTATE_BV field must not set bits not present in XCR0. */ -if (xstate_bv & ~env->xcr0) { -raise_exception_ra(env, EXCP0D_GPF, ra); -} +return (xstate_bv & ~ac->env->xcr0) == 0; +} -/* The XCOMP_BV field must be zero. Note that, as of the April 2016 - revision, the description of the XSAVE Header (Vol 1, Sec 13.4.2) - describes only XCOMP_BV, but the description of the standard form - of XRSTOR (Vol 1, Sec 13.8.1) checks bytes 23:8 for zero, which - includes the next 64-bit field. */ -xcomp_bv = access_ldq(, ptr + XO(header.xcomp_bv)); -reserve0 = access_ldq(, ptr + XO(header.reserve0)); -if (xcomp_bv || reserve0) { -raise_exception_ra(env, EXCP0D_GPF, ra); -} - -size_ext = xsave_area_size(rfbm & xstate_bv, false); -if (size < size_ext) { -/* TODO: See if existing page probe has covered extra size. */ -access_prepare(, env, ptr, size_ext, MMU_DATA_LOAD, ra); -} +static void do_xrstor(X86Access *ac, target_ulong ptr, + uint64_t rfbm, uint64_t xstate_bv) +{ +CPUX86State *env = ac->env; if (rfbm & XSTATE_FP_MASK) { if (xstate_bv & XSTATE_FP_MASK) { -do_xrstor_fpu(, ptr); +do_xrstor_fpu(ac, ptr); } else { do_fninit(env); memset(env->fpregs, 0, sizeof(env->fpregs)); @@ -2955,23 +2942,23 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr if (rfbm & XSTATE_SSE_MASK) { /* Note that the standard form of XRSTOR loads MXCSR from memory whether or not the XSTATE_BV bit is set. */ -do_xrstor_mxcsr(, ptr); +do_xrstor_mxcsr(ac, ptr); if (xstate_bv & XSTATE_SSE_MASK) { -do_xrstor_sse(, ptr); +do_xrstor_sse(ac, ptr); } else { do_clear_sse(env); } } if (rfbm & XSTATE_YMM_MASK) { if (xstate_bv & XSTATE_YMM_MASK) { -do_xrstor_ymmh(, ptr + XO(avx_state)); +do_xrstor_ymmh(ac, ptr + XO(avx_state)); } else { do_clear_ymmh(env); } } if (rfbm & XSTATE_BNDREGS_MASK) { if (xstate_bv & XSTATE_BNDREGS_MASK) { -do_xrstor_bndregs(, ptr + XO(bndreg_state)); +do_xrstor_bndregs(ac, ptr + XO(bndreg_state)); env->hflags |= HF_MPX_IU_MASK; } else { memset(env->bnd_regs, 0, sizeof(env->bnd_regs)); @@ -2980,7 +2967,7 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr } if (rfbm & XSTATE_BNDCSR_MASK) { if (xstate_bv & XSTATE_BNDCSR_MASK) { -do_xrstor_bndcsr(, ptr + XO(bndcsr_state)); +do_xrstor_bndcsr(ac, ptr + XO(bndcsr_state)); } else { memset(>bndcs_regs, 0, sizeof(env->bndcs_regs)); } @@ -2989,7 +2976,7 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr if (rfbm & XSTATE_PKRU_MASK) { uint64_t old_pkru = env->pkru; if (xstate_bv & XSTATE_PKRU_MASK) { -do_xrstor_pkru(, ptr + XO(pkru_state)); +do_xrstor_pkru(ac, ptr + XO(pkru_state)); } else { env->pkru = 0; } @@ -3005,9 +2992,27 @@ static void
[PATCH v2 22/28] linux-user/i386: Fix allocation and alignment of fp state
For modern cpus, the kernel uses xsave to store all extra cpu state across the signal handler. For xsave/xrstor to work, the pointer must be 64 byte aligned. Moreover, the regular part of the signal frame must be 16 byte aligned. Attempt to mirror the kernel code as much as possible. Use enum FPStateKind instead of use_xsave() and use_fxsr(). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1648 Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 558 +++ tests/tcg/x86_64/test-1648.c | 33 ++ tests/tcg/x86_64/Makefile.target | 1 + 3 files changed, 377 insertions(+), 215 deletions(-) create mode 100644 tests/tcg/x86_64/test-1648.c diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index d600a4355b..d015fe520a 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -64,20 +64,6 @@ struct target_fpstate_32 { X86LegacyXSaveArea fxstate; }; -/* - * For simplicity, setup_frame aligns struct target_fpstate_32 to - * 16 bytes, so ensure that the FXSAVE area is also aligned. - */ -QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxstate) & 15); - -#ifndef TARGET_X86_64 -# define target_fpstate target_fpstate_32 -# define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, fxstate) -#else -# define target_fpstate X86LegacyXSaveArea -# define TARGET_FPSTATE_FXSAVE_OFFSET 0 -#endif - struct target_sigcontext_32 { uint16_t gs, __gsh; uint16_t fs, __fsh; @@ -160,24 +146,16 @@ struct sigframe { int sig; struct target_sigcontext sc; /* - * The actual fpstate is placed after retcode[] below, to make - * room for the variable-sized xsave data. The older unused fpstate - * has to be kept to avoid changing the offset of extramask[], which + * The actual fpstate is placed after retcode[] below, to make room + * for the variable-sized xsave data. The older unused fpstate has + * to be kept to avoid changing the offset of extramask[], which * is part of the ABI. */ -struct target_fpstate fpstate_unused; +struct target_fpstate_32 fpstate_unused; abi_ulong extramask[TARGET_NSIG_WORDS-1]; char retcode[8]; - -/* - * This field will be 16-byte aligned in memory. Applying QEMU_ALIGNED - * to it ensures that the base of the frame has an appropriate alignment - * too. - */ -struct target_fpstate fpstate QEMU_ALIGNED(8); +/* fp state follows here */ }; -#define TARGET_SIGFRAME_FXSAVE_OFFSET (\ -offsetof(struct sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) struct rt_sigframe { abi_ulong pretcode; @@ -187,10 +165,8 @@ struct rt_sigframe { struct target_siginfo info; struct target_ucontext uc; char retcode[8]; -struct target_fpstate fpstate QEMU_ALIGNED(8); +/* fp state follows here */ }; -#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \ -offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) /* * Verify that vdso-asmoffset.h constants match. @@ -208,66 +184,178 @@ struct rt_sigframe { abi_ulong pretcode; struct target_ucontext uc; struct target_siginfo info; -struct target_fpstate fpstate QEMU_ALIGNED(16); +/* fp state follows here */ }; -#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \ -offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) #endif +typedef enum { +#ifndef TARGET_X86_64 +FPSTATE_FSAVE, +#endif +FPSTATE_FXSAVE, +FPSTATE_XSAVE +} FPStateKind; + +static FPStateKind get_fpstate_kind(CPUX86State *env) +{ +if (env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE) { +return FPSTATE_XSAVE; +} +#ifdef TARGET_X86_64 +return FPSTATE_FXSAVE; +#else +if (env->features[FEAT_1_EDX] & CPUID_FXSR) { +return FPSTATE_FXSAVE; +} +return FPSTATE_FSAVE; +#endif +} + +static unsigned get_fpstate_size(CPUX86State *env, FPStateKind fpkind) +{ +/* + * Kernel: + * fpu__alloc_mathframe + * xstate_sigframe_size(current->thread.fpu.fpstate); + * size = fpstate->user_size + * use_xsave() ? size + FP_XSTATE_MAGIC2_SIZE : size + * where fpstate->user_size is computed at init in + * fpu__init_system_xstate_size_legacy and + * fpu__init_system_xstate. + * + * Here we have no place to pre-compute, so inline it all. + */ +switch (fpkind) { +case FPSTATE_XSAVE: +return (xsave_area_size(env->xcr0, false) ++ TARGET_FP_XSTATE_MAGIC2_SIZE); +case FPSTATE_FXSAVE: +return sizeof(X86LegacyXSaveArea); +#ifndef TARGET_X86_64 +case FPSTATE_FSAVE: +return sizeof(struct target_fregs_state); +#endif +} +g_assert_not_reached(); +} + +static abi_ptr get_sigframe(struct target_sigaction *ka, CPUX86State *env, +unsigned frame_size,
[PATCH v2 05/28] target/i386: Convert do_fstenv to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 45 +++- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 28ae8100f6..25074af0ce 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2372,9 +2372,9 @@ void helper_fxam_ST0(CPUX86State *env) } } -static void do_fstenv(CPUX86State *env, target_ulong ptr, int data32, - uintptr_t retaddr) +static void do_fstenv(X86Access *ac, target_ulong ptr, int data32) { +CPUX86State *env = ac->env; int fpus, fptag, exp, i; uint64_t mant; CPU_LDoubleU tmp; @@ -2401,28 +2401,31 @@ static void do_fstenv(CPUX86State *env, target_ulong ptr, int data32, } if (data32) { /* 32 bit */ -cpu_stl_data_ra(env, ptr, env->fpuc, retaddr); -cpu_stl_data_ra(env, ptr + 4, fpus, retaddr); -cpu_stl_data_ra(env, ptr + 8, fptag, retaddr); -cpu_stl_data_ra(env, ptr + 12, env->fpip, retaddr); /* fpip */ -cpu_stl_data_ra(env, ptr + 16, env->fpcs, retaddr); /* fpcs */ -cpu_stl_data_ra(env, ptr + 20, env->fpdp, retaddr); /* fpoo */ -cpu_stl_data_ra(env, ptr + 24, env->fpds, retaddr); /* fpos */ +access_stl(ac, ptr, env->fpuc); +access_stl(ac, ptr + 4, fpus); +access_stl(ac, ptr + 8, fptag); +access_stl(ac, ptr + 12, env->fpip); /* fpip */ +access_stl(ac, ptr + 16, env->fpcs); /* fpcs */ +access_stl(ac, ptr + 20, env->fpdp); /* fpoo */ +access_stl(ac, ptr + 24, env->fpds); /* fpos */ } else { /* 16 bit */ -cpu_stw_data_ra(env, ptr, env->fpuc, retaddr); -cpu_stw_data_ra(env, ptr + 2, fpus, retaddr); -cpu_stw_data_ra(env, ptr + 4, fptag, retaddr); -cpu_stw_data_ra(env, ptr + 6, env->fpip, retaddr); -cpu_stw_data_ra(env, ptr + 8, env->fpcs, retaddr); -cpu_stw_data_ra(env, ptr + 10, env->fpdp, retaddr); -cpu_stw_data_ra(env, ptr + 12, env->fpds, retaddr); +access_stw(ac, ptr, env->fpuc); +access_stw(ac, ptr + 2, fpus); +access_stw(ac, ptr + 4, fptag); +access_stw(ac, ptr + 6, env->fpip); +access_stw(ac, ptr + 8, env->fpcs); +access_stw(ac, ptr + 10, env->fpdp); +access_stw(ac, ptr + 12, env->fpds); } } void helper_fstenv(CPUX86State *env, target_ulong ptr, int data32) { -do_fstenv(env, ptr, data32, GETPC()); +X86Access ac; + +access_prepare(, env, ptr, 14 << data32, MMU_DATA_STORE, GETPC()); +do_fstenv(, ptr, data32); } static void cpu_set_fpus(CPUX86State *env, uint16_t fpus) @@ -2470,12 +2473,12 @@ static void do_fsave(CPUX86State *env, target_ulong ptr, int data32, { X86Access ac; floatx80 tmp; -int i; +int i, envsize = 14 << data32; -do_fstenv(env, ptr, data32, retaddr); +access_prepare(, env, ptr, envsize + 80, MMU_DATA_STORE, GETPC()); -ptr += (target_ulong)14 << data32; -access_prepare(, env, ptr, 80, MMU_DATA_STORE, GETPC()); +do_fstenv(, ptr, data32); +ptr += envsize; for (i = 0; i < 8; i++) { tmp = ST(i); -- 2.34.1
[PATCH v2 02/28] target/i386: Convert do_fldt, do_fstt to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 44 +--- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 4b965a5d6c..878fad9795 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -26,6 +26,7 @@ #include "fpu/softfloat.h" #include "fpu/softfloat-macros.h" #include "helper-tcg.h" +#include "access.h" /* float macros */ #define FT0(env->ft0) @@ -83,23 +84,22 @@ static inline void fpop(CPUX86State *env) env->fpstt = (env->fpstt + 1) & 7; } -static floatx80 do_fldt(CPUX86State *env, target_ulong ptr, uintptr_t retaddr) +static floatx80 do_fldt(X86Access *ac, target_ulong ptr) { CPU_LDoubleU temp; -temp.l.lower = cpu_ldq_data_ra(env, ptr, retaddr); -temp.l.upper = cpu_lduw_data_ra(env, ptr + 8, retaddr); +temp.l.lower = access_ldq(ac, ptr); +temp.l.upper = access_ldw(ac, ptr + 8); return temp.d; } -static void do_fstt(CPUX86State *env, floatx80 f, target_ulong ptr, -uintptr_t retaddr) +static void do_fstt(X86Access *ac, target_ulong ptr, floatx80 f) { CPU_LDoubleU temp; temp.d = f; -cpu_stq_data_ra(env, ptr, temp.l.lower, retaddr); -cpu_stw_data_ra(env, ptr + 8, temp.l.upper, retaddr); +access_stq(ac, ptr, temp.l.lower); +access_stw(ac, ptr + 8, temp.l.upper); } /* x87 FPU helpers */ @@ -381,16 +381,22 @@ int64_t helper_fisttll_ST0(CPUX86State *env) void helper_fldt_ST0(CPUX86State *env, target_ulong ptr) { int new_fpstt; +X86Access ac; + +access_prepare(, env, ptr, 10, MMU_DATA_LOAD, GETPC()); new_fpstt = (env->fpstt - 1) & 7; -env->fpregs[new_fpstt].d = do_fldt(env, ptr, GETPC()); +env->fpregs[new_fpstt].d = do_fldt(, ptr); env->fpstt = new_fpstt; env->fptags[new_fpstt] = 0; /* validate stack entry */ } void helper_fstt_ST0(CPUX86State *env, target_ulong ptr) { -do_fstt(env, ST0, ptr, GETPC()); +X86Access ac; + +access_prepare(, env, ptr, 10, MMU_DATA_STORE, GETPC()); +do_fstt(, ptr, ST0); } void helper_fpush(CPUX86State *env) @@ -2459,15 +2465,18 @@ void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32) static void do_fsave(CPUX86State *env, target_ulong ptr, int data32, uintptr_t retaddr) { +X86Access ac; floatx80 tmp; int i; do_fstenv(env, ptr, data32, retaddr); ptr += (target_ulong)14 << data32; +access_prepare(, env, ptr, 80, MMU_DATA_STORE, GETPC()); + for (i = 0; i < 8; i++) { tmp = ST(i); -do_fstt(env, tmp, ptr, retaddr); +do_fstt(, ptr, tmp); ptr += 10; } @@ -2482,14 +2491,17 @@ void helper_fsave(CPUX86State *env, target_ulong ptr, int data32) static void do_frstor(CPUX86State *env, target_ulong ptr, int data32, uintptr_t retaddr) { +X86Access ac; floatx80 tmp; int i; do_fldenv(env, ptr, data32, retaddr); ptr += (target_ulong)14 << data32; +access_prepare(, env, ptr, 80, MMU_DATA_LOAD, retaddr); + for (i = 0; i < 8; i++) { -tmp = do_fldt(env, ptr, retaddr); +tmp = do_fldt(, ptr); ST(i) = tmp; ptr += 10; } @@ -2506,6 +2518,7 @@ static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) { int fpus, fptag, i; target_ulong addr; +X86Access ac; fpus = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11; fptag = 0; @@ -2524,9 +2537,11 @@ static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) cpu_stq_data_ra(env, ptr + XO(legacy.fpdp), 0, ra); /* edp+sel; rdp */ addr = ptr + XO(legacy.fpregs); +access_prepare(, env, addr, 8 * 16, MMU_DATA_STORE, GETPC()); + for (i = 0; i < 8; i++) { floatx80 tmp = ST(i); -do_fstt(env, tmp, addr, ra); +do_fstt(, addr, tmp); addr += 16; } } @@ -2699,6 +2714,7 @@ static void do_xrstor_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) { int i, fpuc, fpus, fptag; target_ulong addr; +X86Access ac; fpuc = cpu_lduw_data_ra(env, ptr + XO(legacy.fcw), ra); fpus = cpu_lduw_data_ra(env, ptr + XO(legacy.fsw), ra); @@ -2711,8 +2727,10 @@ static void do_xrstor_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) } addr = ptr + XO(legacy.fpregs); +access_prepare(, env, addr, 8 * 16, MMU_DATA_LOAD, GETPC()); + for (i = 0; i < 8; i++) { -floatx80 tmp = do_fldt(env, addr, ra); +floatx80 tmp = do_fldt(, addr); ST(i) = tmp; addr += 16; } -- 2.34.1
[PATCH v2 10/28] target/i386: Convert do_xsave_* to X86Access
The body of do_xsave is now fully converted. Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 47 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 23e22e4521..82a041f4bf 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2578,8 +2578,9 @@ static void do_xsave_sse(X86Access *ac, target_ulong ptr) } } -static void do_xsave_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_ymmh(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int i, nb_xmm_regs; if (env->hflags & HF_CS64_MASK) { @@ -2589,33 +2590,36 @@ static void do_xsave_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) } for (i = 0; i < nb_xmm_regs; i++, ptr += 16) { -cpu_stq_data_ra(env, ptr, env->xmm_regs[i].ZMM_Q(2), ra); -cpu_stq_data_ra(env, ptr + 8, env->xmm_regs[i].ZMM_Q(3), ra); +access_stq(ac, ptr, env->xmm_regs[i].ZMM_Q(2)); +access_stq(ac, ptr + 8, env->xmm_regs[i].ZMM_Q(3)); } } -static void do_xsave_bndregs(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_bndregs(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; target_ulong addr = ptr + offsetof(XSaveBNDREG, bnd_regs); int i; for (i = 0; i < 4; i++, addr += 16) { -cpu_stq_data_ra(env, addr, env->bnd_regs[i].lb, ra); -cpu_stq_data_ra(env, addr + 8, env->bnd_regs[i].ub, ra); +access_stq(ac, addr, env->bnd_regs[i].lb); +access_stq(ac, addr + 8, env->bnd_regs[i].ub); } } -static void do_xsave_bndcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_bndcsr(X86Access *ac, target_ulong ptr) { -cpu_stq_data_ra(env, ptr + offsetof(XSaveBNDCSR, bndcsr.cfgu), -env->bndcs_regs.cfgu, ra); -cpu_stq_data_ra(env, ptr + offsetof(XSaveBNDCSR, bndcsr.sts), -env->bndcs_regs.sts, ra); +CPUX86State *env = ac->env; + +access_stq(ac, ptr + offsetof(XSaveBNDCSR, bndcsr.cfgu), + env->bndcs_regs.cfgu); +access_stq(ac, ptr + offsetof(XSaveBNDCSR, bndcsr.sts), + env->bndcs_regs.sts); } -static void do_xsave_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_pkru(X86Access *ac, target_ulong ptr) { -cpu_stq_data_ra(env, ptr, env->pkru, ra); +access_stq(ac, ptr, ac->env->pkru); } static void do_fxsave(X86Access *ac, target_ulong ptr) @@ -2668,6 +2672,7 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, { uint64_t old_bv, new_bv; X86Access ac; +unsigned size; /* The OS must have enabled XSAVE. */ if (!(env->cr[4] & CR4_OSXSAVE_MASK)) { @@ -2683,8 +2688,8 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, rfbm &= env->xcr0; opt &= rfbm; -access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), - MMU_DATA_STORE, ra); +size = xsave_area_size(opt, false); +access_prepare(, env, ptr, size, MMU_DATA_STORE, ra); if (opt & XSTATE_FP_MASK) { do_xsave_fpu(, ptr); @@ -2697,22 +2702,22 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, do_xsave_sse(, ptr); } if (opt & XSTATE_YMM_MASK) { -do_xsave_ymmh(env, ptr + XO(avx_state), ra); +do_xsave_ymmh(, ptr + XO(avx_state)); } if (opt & XSTATE_BNDREGS_MASK) { -do_xsave_bndregs(env, ptr + XO(bndreg_state), ra); +do_xsave_bndregs(, ptr + XO(bndreg_state)); } if (opt & XSTATE_BNDCSR_MASK) { -do_xsave_bndcsr(env, ptr + XO(bndcsr_state), ra); +do_xsave_bndcsr(, ptr + XO(bndcsr_state)); } if (opt & XSTATE_PKRU_MASK) { -do_xsave_pkru(env, ptr + XO(pkru_state), ra); +do_xsave_pkru(, ptr + XO(pkru_state)); } /* Update the XSTATE_BV field. */ -old_bv = cpu_ldq_data_ra(env, ptr + XO(header.xstate_bv), ra); +old_bv = access_ldq(, ptr + XO(header.xstate_bv)); new_bv = (old_bv & ~rfbm) | (inuse & rfbm); -cpu_stq_data_ra(env, ptr + XO(header.xstate_bv), new_bv, ra); +access_stq(, ptr + XO(header.xstate_bv), new_bv); } void helper_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) -- 2.34.1
[PATCH v2 24/28] target/i386: Convert do_xsave to X86Access
Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 2 +- target/i386/tcg/fpu_helper.c | 72 +--- 2 files changed, 43 insertions(+), 31 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index fd09c973d4..ba17d27219 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -328,7 +328,7 @@ static void xsave_sigcontext(CPUX86State *env, /* Zero the header, XSAVE *adds* features to an existing save state. */ memset(fxstate + 1, 0, sizeof(X86XSaveHeader)); -cpu_x86_xsave(env, xstate_addr, -1); +cpu_x86_xsave(env, xstate_addr, env->xcr0); __put_user(TARGET_FP_XSTATE_MAGIC1, >magic1); __put_user(extended_size, >extended_size); diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index dbc1e5d8dd..d4dd09dc95 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2667,47 +2667,38 @@ static uint64_t get_xinuse(CPUX86State *env) return inuse; } -static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, - uint64_t inuse, uint64_t opt, uintptr_t ra) +static void do_xsave_access(X86Access *ac, target_ulong ptr, uint64_t rfbm, +uint64_t inuse, uint64_t opt) { uint64_t old_bv, new_bv; -X86Access ac; -unsigned size; - -/* Never save anything not enabled by XCR0. */ -rfbm &= env->xcr0; -opt &= rfbm; - -size = xsave_area_size(opt, false); -access_prepare(, env, ptr, size, MMU_DATA_STORE, ra); if (opt & XSTATE_FP_MASK) { -do_xsave_fpu(, ptr); +do_xsave_fpu(ac, ptr); } if (rfbm & XSTATE_SSE_MASK) { /* Note that saving MXCSR is not suppressed by XSAVEOPT. */ -do_xsave_mxcsr(, ptr); +do_xsave_mxcsr(ac, ptr); } if (opt & XSTATE_SSE_MASK) { -do_xsave_sse(, ptr); +do_xsave_sse(ac, ptr); } if (opt & XSTATE_YMM_MASK) { -do_xsave_ymmh(, ptr + XO(avx_state)); +do_xsave_ymmh(ac, ptr + XO(avx_state)); } if (opt & XSTATE_BNDREGS_MASK) { -do_xsave_bndregs(, ptr + XO(bndreg_state)); +do_xsave_bndregs(ac, ptr + XO(bndreg_state)); } if (opt & XSTATE_BNDCSR_MASK) { -do_xsave_bndcsr(, ptr + XO(bndcsr_state)); +do_xsave_bndcsr(ac, ptr + XO(bndcsr_state)); } if (opt & XSTATE_PKRU_MASK) { -do_xsave_pkru(, ptr + XO(pkru_state)); +do_xsave_pkru(ac, ptr + XO(pkru_state)); } /* Update the XSTATE_BV field. */ -old_bv = access_ldq(, ptr + XO(header.xstate_bv)); +old_bv = access_ldq(ac, ptr + XO(header.xstate_bv)); new_bv = (old_bv & ~rfbm) | (inuse & rfbm); -access_stq(, ptr + XO(header.xstate_bv), new_bv); +access_stq(ac, ptr + XO(header.xstate_bv), new_bv); } static void do_xsave_chk(CPUX86State *env, target_ulong ptr, uintptr_t ra) @@ -2723,22 +2714,32 @@ static void do_xsave_chk(CPUX86State *env, target_ulong ptr, uintptr_t ra) } } -void helper_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) +static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, + uint64_t inuse, uint64_t opt, uintptr_t ra) { -uintptr_t ra = GETPC(); +X86Access ac; +unsigned size; do_xsave_chk(env, ptr, ra); -do_xsave(env, ptr, rfbm, get_xinuse(env), -1, ra); + +/* Never save anything not enabled by XCR0. */ +rfbm &= env->xcr0; +opt &= rfbm; +size = xsave_area_size(opt, false); + +access_prepare(, env, ptr, size, MMU_DATA_STORE, ra); +do_xsave_access(, ptr, rfbm, inuse, opt); +} + +void helper_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) +{ +do_xsave(env, ptr, rfbm, get_xinuse(env), rfbm, GETPC()); } void helper_xsaveopt(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -uintptr_t ra = GETPC(); -uint64_t inuse; - -do_xsave_chk(env, ptr, ra); -inuse = get_xinuse(env); -do_xsave(env, ptr, rfbm, inuse, inuse, ra); +uint64_t inuse = get_xinuse(env); +do_xsave(env, ptr, rfbm, inuse, inuse, GETPC()); } static void do_xrstor_fpu(X86Access *ac, target_ulong ptr) @@ -3048,7 +3049,18 @@ void cpu_x86_fxrstor(CPUX86State *env, target_ulong ptr) void cpu_x86_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) { -do_xsave(env, ptr, rfbm, get_xinuse(env), -1, 0); +X86Access ac; +unsigned size; + +/* + * Since this is only called from user-level signal handling, + * we should have done the job correctly there. + */ +assert((rfbm & ~env->xcr0) == 0); +size = xsave_area_size(rfbm, false); + +access_prepare(, env, ptr, size, MMU_DATA_STORE, 0); +do_xsave_access(, ptr, rfbm, get_xinuse(env), rfbm); } void cpu_x86_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) -- 2.34.1
[PATCH v2 26/28] target/i386: Pass host pointer and size to cpu_x86_{fsave, frstor}
We have already validated the memory region in the course of validating the signal frame. No need to do it again within the helper function. Signed-off-by: Richard Henderson --- target/i386/cpu.h| 10 ++ linux-user/i386/signal.c | 4 ++-- target/i386/tcg/fpu_helper.c | 26 -- 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 5f9c420084..8eb97fdd7a 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2227,11 +2227,13 @@ int cpu_x86_get_descr_debug(CPUX86State *env, unsigned int selector, /* used for debug or cpu save/restore */ /* cpu-exec.c */ -/* the following helpers are only usable in user mode simulation as - they can trigger unexpected exceptions */ +/* + * The following helpers are only usable in user mode simulation. + * The host pointers should come from lock_user(). + */ void cpu_x86_load_seg(CPUX86State *s, X86Seg seg_reg, int selector); -void cpu_x86_fsave(CPUX86State *s, target_ulong ptr, int data32); -void cpu_x86_frstor(CPUX86State *s, target_ulong ptr, int data32); +void cpu_x86_fsave(CPUX86State *s, void *host, size_t len); +void cpu_x86_frstor(CPUX86State *s, void *host, size_t len); void cpu_x86_fxsave(CPUX86State *s, target_ulong ptr); void cpu_x86_fxrstor(CPUX86State *s, target_ulong ptr); void cpu_x86_xsave(CPUX86State *s, target_ulong ptr, uint64_t rbfm); diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index ba17d27219..7178440d67 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -372,7 +372,7 @@ static void setup_sigcontext(CPUX86State *env, __put_user(env->regs[R_ESP], >esp_at_signal); __put_user(env->segs[R_SS].selector, (uint32_t *)>ss); -cpu_x86_fsave(env, fpstate_addr, 1); +cpu_x86_fsave(env, fpstate, sizeof(*fpstate)); fpstate->status = fpstate->swd; magic = (fpkind == FPSTATE_FSAVE ? 0 : 0x); __put_user(magic, >magic); @@ -701,7 +701,7 @@ static bool frstor_sigcontext(CPUX86State *env, FPStateKind fpkind, * the merge within ENV by loading XSTATE/FXSTATE first, then * overriding with the FSTATE afterward. */ -cpu_x86_frstor(env, fpstate_addr, 1); +cpu_x86_frstor(env, fpstate, sizeof(*fpstate)); return true; } #endif diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 909da05f91..0a91757690 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3016,22 +3016,28 @@ void helper_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) } #if defined(CONFIG_USER_ONLY) -void cpu_x86_fsave(CPUX86State *env, target_ulong ptr, int data32) +void cpu_x86_fsave(CPUX86State *env, void *host, size_t len) { -int size = (14 << data32) + 80; -X86Access ac; +X86Access ac = { +.haddr1 = host, +.size = 4 * 7 + 8 * 10, +.env = env, +}; -access_prepare(, env, ptr, size, MMU_DATA_STORE, 0); -do_fsave(, ptr, data32); +assert(ac.size <= len); +do_fsave(, 0, true); } -void cpu_x86_frstor(CPUX86State *env, target_ulong ptr, int data32) +void cpu_x86_frstor(CPUX86State *env, void *host, size_t len) { -int size = (14 << data32) + 80; -X86Access ac; +X86Access ac = { +.haddr1 = host, +.size = 4 * 7 + 8 * 10, +.env = env, +}; -access_prepare(, env, ptr, size, MMU_DATA_LOAD, 0); -do_frstor(, ptr, data32); +assert(ac.size <= len); +do_frstor(, 0, true); } void cpu_x86_fxsave(CPUX86State *env, target_ulong ptr) -- 2.34.1
[PATCH v2 15/28] linux-user/i386: Drop xfeatures_size from sigcontext arithmetic
This is subtracting sizeof(target_fpstate_fxsave) in TARGET_FXSAVE_SIZE, then adding it again via >xfeatures. Perform the same computation using xstate_size alone. Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 68659fa1db..547c7cc685 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -252,7 +252,6 @@ static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs __put_user(0, >sw_reserved.magic1); } else { uint32_t xstate_size = xsave_area_size(env->xcr0, false); -uint32_t xfeatures_size = xstate_size - TARGET_FXSAVE_SIZE; /* * extended_size is the offset from fpstate_addr to right after the end @@ -272,7 +271,8 @@ static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs __put_user(extended_size, >sw_reserved.extended_size); __put_user(env->xcr0, >sw_reserved.xfeatures); __put_user(xstate_size, >sw_reserved.xstate_size); -__put_user(TARGET_FP_XSTATE_MAGIC2, (uint32_t *) >xfeatures[xfeatures_size]); +__put_user(TARGET_FP_XSTATE_MAGIC2, + (uint32_t *)((void *)fxsave + xstate_size)); } } @@ -558,7 +558,6 @@ static int xrstor_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs if (env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE) { uint32_t extended_size = tswapl(fxsave->sw_reserved.extended_size); uint32_t xstate_size = tswapl(fxsave->sw_reserved.xstate_size); -uint32_t xfeatures_size = xstate_size - TARGET_FXSAVE_SIZE; /* Linux checks MAGIC2 using xstate_size, not extended_size. */ if (tswapl(fxsave->sw_reserved.magic1) == TARGET_FP_XSTATE_MAGIC1 && @@ -567,7 +566,7 @@ static int xrstor_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxs extended_size - TARGET_FPSTATE_FXSAVE_OFFSET)) { return 1; } -if (tswapl(*(uint32_t *) >xfeatures[xfeatures_size]) == TARGET_FP_XSTATE_MAGIC2) { +if (tswapl(*(uint32_t *)((void *)fxsave + xstate_size)) == TARGET_FP_XSTATE_MAGIC2) { cpu_x86_xrstor(env, fxsave_addr, -1); return 0; } -- 2.34.1
[PATCH v2 07/28] target/i386: Convert do_xsave_{fpu, mxcr, sse} to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 52 +--- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index e6fa161aa0..643e017bef 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2518,11 +2518,11 @@ void helper_frstor(CPUX86State *env, target_ulong ptr, int data32) #define XO(X) offsetof(X86XSaveArea, X) -static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_fpu(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int fpus, fptag, i; target_ulong addr; -X86Access ac; fpus = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11; fptag = 0; @@ -2530,35 +2530,37 @@ static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra) fptag |= (env->fptags[i] << i); } -cpu_stw_data_ra(env, ptr + XO(legacy.fcw), env->fpuc, ra); -cpu_stw_data_ra(env, ptr + XO(legacy.fsw), fpus, ra); -cpu_stw_data_ra(env, ptr + XO(legacy.ftw), fptag ^ 0xff, ra); +access_stw(ac, ptr + XO(legacy.fcw), env->fpuc); +access_stw(ac, ptr + XO(legacy.fsw), fpus); +access_stw(ac, ptr + XO(legacy.ftw), fptag ^ 0xff); /* In 32-bit mode this is eip, sel, dp, sel. In 64-bit mode this is rip, rdp. But in either case we don't write actual data, just zeros. */ -cpu_stq_data_ra(env, ptr + XO(legacy.fpip), 0, ra); /* eip+sel; rip */ -cpu_stq_data_ra(env, ptr + XO(legacy.fpdp), 0, ra); /* edp+sel; rdp */ +access_stq(ac, ptr + XO(legacy.fpip), 0); /* eip+sel; rip */ +access_stq(ac, ptr + XO(legacy.fpdp), 0); /* edp+sel; rdp */ addr = ptr + XO(legacy.fpregs); -access_prepare(, env, addr, 8 * 16, MMU_DATA_STORE, GETPC()); for (i = 0; i < 8; i++) { floatx80 tmp = ST(i); -do_fstt(, addr, tmp); +do_fstt(ac, addr, tmp); addr += 16; } } -static void do_xsave_mxcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_mxcsr(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; + update_mxcsr_from_sse_status(env); -cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr), env->mxcsr, ra); -cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr_mask), 0x, ra); +access_stl(ac, ptr + XO(legacy.mxcsr), env->mxcsr); +access_stl(ac, ptr + XO(legacy.mxcsr_mask), 0x); } -static void do_xsave_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xsave_sse(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int i, nb_xmm_regs; target_ulong addr; @@ -2570,8 +2572,8 @@ static void do_xsave_sse(CPUX86State *env, target_ulong ptr, uintptr_t ra) addr = ptr + XO(legacy.xmm_regs); for (i = 0; i < nb_xmm_regs; i++) { -cpu_stq_data_ra(env, addr, env->xmm_regs[i].ZMM_Q(0), ra); -cpu_stq_data_ra(env, addr + 8, env->xmm_regs[i].ZMM_Q(1), ra); +access_stq(ac, addr, env->xmm_regs[i].ZMM_Q(0)); +access_stq(ac, addr + 8, env->xmm_regs[i].ZMM_Q(1)); addr += 16; } } @@ -2618,20 +2620,24 @@ static void do_xsave_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) static void do_fxsave(CPUX86State *env, target_ulong ptr, uintptr_t ra) { +X86Access ac; + /* The operand must be 16 byte aligned */ if (ptr & 0xf) { raise_exception_ra(env, EXCP0D_GPF, ra); } -do_xsave_fpu(env, ptr, ra); +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_STORE, ra); +do_xsave_fpu(, ptr); if (env->cr[4] & CR4_OSFXSR_MASK) { -do_xsave_mxcsr(env, ptr, ra); +do_xsave_mxcsr(, ptr); /* Fast FXSAVE leaves out the XMM registers */ if (!(env->efer & MSR_EFER_FFXSR) || (env->hflags & HF_CPL_MASK) || !(env->hflags & HF_LMA_MASK)) { -do_xsave_sse(env, ptr, ra); +do_xsave_sse(, ptr); } } } @@ -2659,6 +2665,7 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uint64_t inuse, uint64_t opt, uintptr_t ra) { uint64_t old_bv, new_bv; +X86Access ac; /* The OS must have enabled XSAVE. */ if (!(env->cr[4] & CR4_OSXSAVE_MASK)) { @@ -2674,15 +2681,18 @@ static void do_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm, rfbm &= env->xcr0; opt &= rfbm; +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_STORE, ra); + if (opt & XSTATE_FP_MASK) { -do_xsave_fpu(env, ptr, ra); +do_xsave_fpu(, ptr); } if (rfbm & XSTATE_SSE_MASK) { /* Note that saving MXCSR is not suppressed by XSAVEOPT. */ -do_xsave_mxcsr(env, ptr, ra); +do_xsave_mxcsr(, ptr); } if (opt & XSTATE_SSE_MASK) { -do_xsave_sse(env, ptr, ra); +
[PATCH v2 04/28] target/i386: Convert do_fldenv to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index ad8b536cb5..28ae8100f6 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2441,20 +2441,15 @@ static void cpu_set_fpus(CPUX86State *env, uint16_t fpus) #endif } -static void do_fldenv(CPUX86State *env, target_ulong ptr, int data32, - uintptr_t retaddr) +static void do_fldenv(X86Access *ac, target_ulong ptr, int data32) { int i, fpus, fptag; +CPUX86State *env = ac->env; + +cpu_set_fpuc(env, access_ldw(ac, ptr)); +fpus = access_ldw(ac, ptr + (2 << data32)); +fptag = access_ldw(ac, ptr + (4 << data32)); -if (data32) { -cpu_set_fpuc(env, cpu_lduw_data_ra(env, ptr, retaddr)); -fpus = cpu_lduw_data_ra(env, ptr + 4, retaddr); -fptag = cpu_lduw_data_ra(env, ptr + 8, retaddr); -} else { -cpu_set_fpuc(env, cpu_lduw_data_ra(env, ptr, retaddr)); -fpus = cpu_lduw_data_ra(env, ptr + 2, retaddr); -fptag = cpu_lduw_data_ra(env, ptr + 4, retaddr); -} cpu_set_fpus(env, fpus); for (i = 0; i < 8; i++) { env->fptags[i] = ((fptag & 3) == 3); @@ -2464,7 +2459,10 @@ static void do_fldenv(CPUX86State *env, target_ulong ptr, int data32, void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32) { -do_fldenv(env, ptr, data32, GETPC()); +X86Access ac; + +access_prepare(, env, ptr, 14 << data32, MMU_DATA_STORE, GETPC()); +do_fldenv(, ptr, data32); } static void do_fsave(CPUX86State *env, target_ulong ptr, int data32, @@ -2498,12 +2496,12 @@ static void do_frstor(CPUX86State *env, target_ulong ptr, int data32, { X86Access ac; floatx80 tmp; -int i; +int i, envsize = 14 << data32; -do_fldenv(env, ptr, data32, retaddr); -ptr += (target_ulong)14 << data32; +access_prepare(, env, ptr, envsize + 80, MMU_DATA_LOAD, retaddr); -access_prepare(, env, ptr, 80, MMU_DATA_LOAD, retaddr); +do_fldenv(, ptr, data32); +ptr += envsize; for (i = 0; i < 8; i++) { tmp = do_fldt(, ptr); -- 2.34.1
[PATCH v2 11/28] target/i386: Convert do_xrstor_* to X86Access
The body of do_xrstor is now fully converted. Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 51 ++-- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 82a041f4bf..883002dc22 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2799,8 +2799,9 @@ static void do_clear_sse(CPUX86State *env) } } -static void do_xrstor_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_ymmh(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; int i, nb_xmm_regs; if (env->hflags & HF_CS64_MASK) { @@ -2810,8 +2811,8 @@ static void do_xrstor_ymmh(CPUX86State *env, target_ulong ptr, uintptr_t ra) } for (i = 0; i < nb_xmm_regs; i++, ptr += 16) { -env->xmm_regs[i].ZMM_Q(2) = cpu_ldq_data_ra(env, ptr, ra); -env->xmm_regs[i].ZMM_Q(3) = cpu_ldq_data_ra(env, ptr + 8, ra); +env->xmm_regs[i].ZMM_Q(2) = access_ldq(ac, ptr); +env->xmm_regs[i].ZMM_Q(3) = access_ldq(ac, ptr + 8); } } @@ -2831,29 +2832,32 @@ static void do_clear_ymmh(CPUX86State *env) } } -static void do_xrstor_bndregs(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_bndregs(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; target_ulong addr = ptr + offsetof(XSaveBNDREG, bnd_regs); int i; for (i = 0; i < 4; i++, addr += 16) { -env->bnd_regs[i].lb = cpu_ldq_data_ra(env, addr, ra); -env->bnd_regs[i].ub = cpu_ldq_data_ra(env, addr + 8, ra); +env->bnd_regs[i].lb = access_ldq(ac, addr); +env->bnd_regs[i].ub = access_ldq(ac, addr + 8); } } -static void do_xrstor_bndcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_bndcsr(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; + /* FIXME: Extend highest implemented bit of linear address. */ env->bndcs_regs.cfgu -= cpu_ldq_data_ra(env, ptr + offsetof(XSaveBNDCSR, bndcsr.cfgu), ra); += access_ldq(ac, ptr + offsetof(XSaveBNDCSR, bndcsr.cfgu)); env->bndcs_regs.sts -= cpu_ldq_data_ra(env, ptr + offsetof(XSaveBNDCSR, bndcsr.sts), ra); += access_ldq(ac, ptr + offsetof(XSaveBNDCSR, bndcsr.sts)); } -static void do_xrstor_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_xrstor_pkru(X86Access *ac, target_ulong ptr) { -env->pkru = cpu_ldq_data_ra(env, ptr, ra); +ac->env->pkru = access_ldq(ac, ptr); } static void do_fxrstor(X86Access *ac, target_ulong ptr) @@ -2891,6 +2895,7 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr { uint64_t xstate_bv, xcomp_bv, reserve0; X86Access ac; +unsigned size, size_ext; rfbm &= env->xcr0; @@ -2904,7 +2909,10 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr raise_exception_ra(env, EXCP0D_GPF, ra); } -xstate_bv = cpu_ldq_data_ra(env, ptr + XO(header.xstate_bv), ra); +size = sizeof(X86LegacyXSaveArea) + sizeof(X86XSaveHeader); +access_prepare(, env, ptr, size, MMU_DATA_LOAD, ra); + +xstate_bv = access_ldq(, ptr + XO(header.xstate_bv)); if ((int64_t)xstate_bv < 0) { /* FIXME: Compact form. */ @@ -2923,14 +2931,17 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr describes only XCOMP_BV, but the description of the standard form of XRSTOR (Vol 1, Sec 13.8.1) checks bytes 23:8 for zero, which includes the next 64-bit field. */ -xcomp_bv = cpu_ldq_data_ra(env, ptr + XO(header.xcomp_bv), ra); -reserve0 = cpu_ldq_data_ra(env, ptr + XO(header.reserve0), ra); +xcomp_bv = access_ldq(, ptr + XO(header.xcomp_bv)); +reserve0 = access_ldq(, ptr + XO(header.reserve0)); if (xcomp_bv || reserve0) { raise_exception_ra(env, EXCP0D_GPF, ra); } -access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), - MMU_DATA_LOAD, ra); +size_ext = xsave_area_size(rfbm & xstate_bv, false); +if (size < size_ext) { +/* TODO: See if existing page probe has covered extra size. */ +access_prepare(, env, ptr, size_ext, MMU_DATA_LOAD, ra); +} if (rfbm & XSTATE_FP_MASK) { if (xstate_bv & XSTATE_FP_MASK) { @@ -2952,14 +2963,14 @@ static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr } if (rfbm & XSTATE_YMM_MASK) { if (xstate_bv & XSTATE_YMM_MASK) { -do_xrstor_ymmh(env, ptr + XO(avx_state), ra); +do_xrstor_ymmh(, ptr + XO(avx_state)); } else { do_clear_ymmh(env); } } if (rfbm & XSTATE_BNDREGS_MASK) { if (xstate_bv & XSTATE_BNDREGS_MASK) { -do_xrstor_bndregs(env, ptr + XO(bndreg_state), ra); +do_xrstor_bndregs(,
[PATCH v2 06/28] target/i386: Convert do_fsave, do_frstor to X86Access
Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 60 1 file changed, 33 insertions(+), 27 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 25074af0ce..e6fa161aa0 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2468,21 +2468,16 @@ void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32) do_fldenv(, ptr, data32); } -static void do_fsave(CPUX86State *env, target_ulong ptr, int data32, - uintptr_t retaddr) +static void do_fsave(X86Access *ac, target_ulong ptr, int data32) { -X86Access ac; -floatx80 tmp; -int i, envsize = 14 << data32; +CPUX86State *env = ac->env; -access_prepare(, env, ptr, envsize + 80, MMU_DATA_STORE, GETPC()); +do_fstenv(ac, ptr, data32); +ptr += 14 << data32; -do_fstenv(, ptr, data32); -ptr += envsize; - -for (i = 0; i < 8; i++) { -tmp = ST(i); -do_fstt(, ptr, tmp); +for (int i = 0; i < 8; i++) { +floatx80 tmp = ST(i); +do_fstt(ac, ptr, tmp); ptr += 10; } @@ -2491,23 +2486,22 @@ static void do_fsave(CPUX86State *env, target_ulong ptr, int data32, void helper_fsave(CPUX86State *env, target_ulong ptr, int data32) { -do_fsave(env, ptr, data32, GETPC()); +int size = (14 << data32) + 80; +X86Access ac; + +access_prepare(, env, ptr, size, MMU_DATA_STORE, GETPC()); +do_fsave(, ptr, data32); } -static void do_frstor(CPUX86State *env, target_ulong ptr, int data32, - uintptr_t retaddr) +static void do_frstor(X86Access *ac, target_ulong ptr, int data32) { -X86Access ac; -floatx80 tmp; -int i, envsize = 14 << data32; +CPUX86State *env = ac->env; -access_prepare(, env, ptr, envsize + 80, MMU_DATA_LOAD, retaddr); +do_fldenv(ac, ptr, data32); +ptr += 14 << data32; -do_fldenv(, ptr, data32); -ptr += envsize; - -for (i = 0; i < 8; i++) { -tmp = do_fldt(, ptr); +for (int i = 0; i < 8; i++) { +floatx80 tmp = do_fldt(ac, ptr); ST(i) = tmp; ptr += 10; } @@ -2515,7 +2509,11 @@ static void do_frstor(CPUX86State *env, target_ulong ptr, int data32, void helper_frstor(CPUX86State *env, target_ulong ptr, int data32) { -do_frstor(env, ptr, data32, GETPC()); +int size = (14 << data32) + 80; +X86Access ac; + +access_prepare(, env, ptr, size, MMU_DATA_LOAD, GETPC()); +do_frstor(, ptr, data32); } #define XO(X) offsetof(X86XSaveArea, X) @@ -2971,12 +2969,20 @@ void helper_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) #if defined(CONFIG_USER_ONLY) void cpu_x86_fsave(CPUX86State *env, target_ulong ptr, int data32) { -do_fsave(env, ptr, data32, 0); +int size = (14 << data32) + 80; +X86Access ac; + +access_prepare(, env, ptr, size, MMU_DATA_STORE, 0); +do_fsave(, ptr, data32); } void cpu_x86_frstor(CPUX86State *env, target_ulong ptr, int data32) { -do_frstor(env, ptr, data32, 0); +int size = (14 << data32) + 80; +X86Access ac; + +access_prepare(, env, ptr, size, MMU_DATA_LOAD, 0); +do_frstor(, ptr, data32); } void cpu_x86_fxsave(CPUX86State *env, target_ulong ptr) -- 2.34.1
[PATCH v2 14/28] target/i386: Add {hw, sw}_reserved to X86LegacyXSaveArea
This completes the 512 byte structure, allowing the union to be removed. Assert that the structure layout is as expected. Signed-off-by: Richard Henderson --- target/i386/cpu.h | 39 +-- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 5860acb0c3..5f9c420084 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1419,23 +1419,34 @@ typedef struct { */ #define UNASSIGNED_APIC_ID 0x -typedef union X86LegacyXSaveArea { -struct { -uint16_t fcw; -uint16_t fsw; -uint8_t ftw; -uint8_t reserved; -uint16_t fpop; -uint64_t fpip; -uint64_t fpdp; -uint32_t mxcsr; -uint32_t mxcsr_mask; -FPReg fpregs[8]; -uint8_t xmm_regs[16][16]; +typedef struct X86LegacyXSaveArea { +uint16_t fcw; +uint16_t fsw; +uint8_t ftw; +uint8_t reserved; +uint16_t fpop; +union { +struct { +uint64_t fpip; +uint64_t fpdp; +}; +struct { +uint32_t fip; +uint32_t fcs; +uint32_t foo; +uint32_t fos; +}; }; -uint8_t data[512]; +uint32_t mxcsr; +uint32_t mxcsr_mask; +FPReg fpregs[8]; +uint8_t xmm_regs[16][16]; +uint32_t hw_reserved[12]; +uint32_t sw_reserved[12]; } X86LegacyXSaveArea; +QEMU_BUILD_BUG_ON(sizeof(X86LegacyXSaveArea) != 512); + typedef struct X86XSaveHeader { uint64_t xstate_bv; uint64_t xcomp_bv; -- 2.34.1
[PATCH v2 09/28] tagret/i386: Convert do_fxsave, do_fxrstor to X86Access
Move the alignment fault from do_* to helper_*, as it need not apply to usage from within user-only signal handling. Signed-off-by: Richard Henderson --- target/i386/tcg/fpu_helper.c | 84 1 file changed, 48 insertions(+), 36 deletions(-) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 59f73ad075..23e22e4521 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2618,8 +2618,25 @@ static void do_xsave_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) cpu_stq_data_ra(env, ptr, env->pkru, ra); } -static void do_fxsave(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_fxsave(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; + +do_xsave_fpu(ac, ptr); +if (env->cr[4] & CR4_OSFXSR_MASK) { +do_xsave_mxcsr(ac, ptr); +/* Fast FXSAVE leaves out the XMM registers */ +if (!(env->efer & MSR_EFER_FFXSR) +|| (env->hflags & HF_CPL_MASK) +|| !(env->hflags & HF_LMA_MASK)) { +do_xsave_sse(ac, ptr); +} +} +} + +void helper_fxsave(CPUX86State *env, target_ulong ptr) +{ +uintptr_t ra = GETPC(); X86Access ac; /* The operand must be 16 byte aligned */ @@ -2629,22 +2646,7 @@ static void do_fxsave(CPUX86State *env, target_ulong ptr, uintptr_t ra) access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), MMU_DATA_STORE, ra); -do_xsave_fpu(, ptr); - -if (env->cr[4] & CR4_OSFXSR_MASK) { -do_xsave_mxcsr(, ptr); -/* Fast FXSAVE leaves out the XMM registers */ -if (!(env->efer & MSR_EFER_FFXSR) -|| (env->hflags & HF_CPL_MASK) -|| !(env->hflags & HF_LMA_MASK)) { -do_xsave_sse(, ptr); -} -} -} - -void helper_fxsave(CPUX86State *env, target_ulong ptr) -{ -do_fxsave(env, ptr, GETPC()); +do_fxsave(, ptr); } static uint64_t get_xinuse(CPUX86State *env) @@ -2849,8 +2851,25 @@ static void do_xrstor_pkru(CPUX86State *env, target_ulong ptr, uintptr_t ra) env->pkru = cpu_ldq_data_ra(env, ptr, ra); } -static void do_fxrstor(CPUX86State *env, target_ulong ptr, uintptr_t ra) +static void do_fxrstor(X86Access *ac, target_ulong ptr) { +CPUX86State *env = ac->env; + +do_xrstor_fpu(ac, ptr); +if (env->cr[4] & CR4_OSFXSR_MASK) { +do_xrstor_mxcsr(ac, ptr); +/* Fast FXRSTOR leaves out the XMM registers */ +if (!(env->efer & MSR_EFER_FFXSR) +|| (env->hflags & HF_CPL_MASK) +|| !(env->hflags & HF_LMA_MASK)) { +do_xrstor_sse(ac, ptr); +} +} +} + +void helper_fxrstor(CPUX86State *env, target_ulong ptr) +{ +uintptr_t ra = GETPC(); X86Access ac; /* The operand must be 16 byte aligned */ @@ -2860,22 +2879,7 @@ static void do_fxrstor(CPUX86State *env, target_ulong ptr, uintptr_t ra) access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), MMU_DATA_LOAD, ra); -do_xrstor_fpu(, ptr); - -if (env->cr[4] & CR4_OSFXSR_MASK) { -do_xrstor_mxcsr(, ptr); -/* Fast FXRSTOR leaves out the XMM registers */ -if (!(env->efer & MSR_EFER_FFXSR) -|| (env->hflags & HF_CPL_MASK) -|| !(env->hflags & HF_LMA_MASK)) { -do_xrstor_sse(, ptr); -} -} -} - -void helper_fxrstor(CPUX86State *env, target_ulong ptr) -{ -do_fxrstor(env, ptr, GETPC()); +do_fxrstor(, ptr); } static void do_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm, uintptr_t ra) @@ -3007,12 +3011,20 @@ void cpu_x86_frstor(CPUX86State *env, target_ulong ptr, int data32) void cpu_x86_fxsave(CPUX86State *env, target_ulong ptr) { -do_fxsave(env, ptr, 0); +X86Access ac; + +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_STORE, 0); +do_fxsave(, ptr); } void cpu_x86_fxrstor(CPUX86State *env, target_ulong ptr) { -do_fxrstor(env, ptr, 0); +X86Access ac; + +access_prepare(, env, ptr, sizeof(X86LegacyXSaveArea), + MMU_DATA_LOAD, 0); +do_fxrstor(, ptr); } void cpu_x86_xsave(CPUX86State *env, target_ulong ptr) -- 2.34.1
[PATCH v2 28/28] target/i386: Pass host pointer and size to cpu_x86_{xsave, xrstor}
We have already validated the memory region in the course of validating the signal frame. No need to do it again within the helper function. In addition, return failure when the header contains invalid xstate_bv. The kernel handles this via exception handling within XSTATE_OP within xrstor_from_user_sigframe. Signed-off-by: Richard Henderson --- target/i386/cpu.h| 4 ++-- linux-user/i386/signal.c | 20 target/i386/tcg/fpu_helper.c | 36 +++- 3 files changed, 33 insertions(+), 27 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 35a8bf831f..21d905d669 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -2236,8 +2236,8 @@ void cpu_x86_fsave(CPUX86State *s, void *host, size_t len); void cpu_x86_frstor(CPUX86State *s, void *host, size_t len); void cpu_x86_fxsave(CPUX86State *s, void *host, size_t len); void cpu_x86_fxrstor(CPUX86State *s, void *host, size_t len); -void cpu_x86_xsave(CPUX86State *s, target_ulong ptr, uint64_t rbfm); -void cpu_x86_xrstor(CPUX86State *s, target_ulong ptr, uint64_t rbfm); +void cpu_x86_xsave(CPUX86State *s, void *host, size_t len, uint64_t rbfm); +bool cpu_x86_xrstor(CPUX86State *s, void *host, size_t len, uint64_t rbfm); /* cpu.c */ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index b823dee17f..d8803e7df3 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -325,7 +325,7 @@ static void xsave_sigcontext(CPUX86State *env, /* Zero the header, XSAVE *adds* features to an existing save state. */ memset(fxstate + 1, 0, sizeof(X86XSaveHeader)); -cpu_x86_xsave(env, xstate_addr, env->xcr0); +cpu_x86_xsave(env, fxstate, fpend_addr - xstate_addr, env->xcr0); __put_user(TARGET_FP_XSTATE_MAGIC1, >magic1); __put_user(extended_size, >extended_size); @@ -610,6 +610,8 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, uint32_t magic1, magic2; uint32_t extended_size, xstate_size, min_size, max_size; uint64_t xfeatures; +void *xstate; +bool ok; switch (fpkind) { case FPSTATE_XSAVE: @@ -640,8 +642,10 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, return false; } -if (!access_ok(env_cpu(env), VERIFY_READ, fxstate_addr, - xstate_size + TARGET_FP_XSTATE_MAGIC2_SIZE)) { +/* Re-lock the entire xstate area, with the extensions and magic. */ +xstate = lock_user(VERIFY_READ, fxstate_addr, + xstate_size + TARGET_FP_XSTATE_MAGIC2_SIZE, 1); +if (!xstate) { return false; } @@ -651,15 +655,15 @@ static bool xrstor_sigcontext(CPUX86State *env, FPStateKind fpkind, * fpstate layout with out copying the extended state information * in the memory layout. */ -if (get_user_u32(magic2, fxstate_addr + xstate_size)) { -return false; -} +magic2 = tswap32(*(uint32_t *)(xstate + xstate_size)); if (magic2 != FP_XSTATE_MAGIC2) { +unlock_user(xstate, fxstate_addr, 0); break; } -cpu_x86_xrstor(env, fxstate_addr, xfeatures); -return true; +ok = cpu_x86_xrstor(env, xstate, xstate_size, xfeatures); +unlock_user(xstate, fxstate_addr, 0); +return ok; default: break; diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index 1c2121c559..4ec0f3786f 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3064,42 +3064,44 @@ void cpu_x86_fxrstor(CPUX86State *env, void *host, size_t len) do_fxrstor(, 0); } -void cpu_x86_xsave(CPUX86State *env, target_ulong ptr, uint64_t rfbm) +void cpu_x86_xsave(CPUX86State *env, void *host, size_t len, uint64_t rfbm) { -X86Access ac; -unsigned size; +X86Access ac = { +.haddr1 = host, +.env = env, +}; /* * Since this is only called from user-level signal handling, * we should have done the job correctly there. */ assert((rfbm & ~env->xcr0) == 0); -size = xsave_area_size(rfbm, false); - -access_prepare(, env, ptr, size, MMU_DATA_STORE, 0); -do_xsave_access(, ptr, rfbm, get_xinuse(env), rfbm); +ac.size = xsave_area_size(rfbm, false); +assert(ac.size <= len); +do_xsave_access(, 0, rfbm, get_xinuse(env), rfbm); } -void cpu_x86_xrstor(CPUX86State *env, target_ulong ptr, uint64_t rfbm) +bool cpu_x86_xrstor(CPUX86State *env, void *host, size_t len, uint64_t rfbm) { -X86Access ac; +X86Access ac = { +.haddr1 = host, +.env = env, +}; uint64_t xstate_bv; -unsigned size; /* * Since this is only called from user-level signal handling, * we should have done the job correctly there. */ assert((rfbm &
[PATCH v2 19/28] linux-user/i386: Fix -mregparm=3 for signal delivery
Since v2.6.19, the kernel has supported -mregparm=3. Signed-off-by: Richard Henderson --- linux-user/i386/signal.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 559b63c25b..f8cc0cff07 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -427,6 +427,11 @@ void setup_frame(int sig, struct target_sigaction *ka, env->regs[R_ESP] = frame_addr; env->eip = ka->_sa_handler; +/* Make -mregparm=3 work */ +env->regs[R_EAX] = sig; +env->regs[R_EDX] = 0; +env->regs[R_ECX] = 0; + cpu_x86_load_seg(env, R_DS, __USER_DS); cpu_x86_load_seg(env, R_ES, __USER_DS); cpu_x86_load_seg(env, R_SS, __USER_DS); @@ -448,9 +453,6 @@ void setup_rt_frame(int sig, struct target_sigaction *ka, target_sigset_t *set, CPUX86State *env) { abi_ulong frame_addr; -#ifndef TARGET_X86_64 -abi_ulong addr; -#endif struct rt_sigframe *frame; int i; @@ -460,14 +462,6 @@ void setup_rt_frame(int sig, struct target_sigaction *ka, if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) goto give_sigsegv; -/* These fields are only in rt_sigframe on 32 bit */ -#ifndef TARGET_X86_64 -__put_user(sig, >sig); -addr = frame_addr + offsetof(struct rt_sigframe, info); -__put_user(addr, >pinfo); -addr = frame_addr + offsetof(struct rt_sigframe, uc); -__put_user(addr, >puc); -#endif if (ka->sa_flags & TARGET_SA_SIGINFO) { frame->info = *info; } @@ -507,9 +501,13 @@ void setup_rt_frame(int sig, struct target_sigaction *ka, env->eip = ka->_sa_handler; #ifndef TARGET_X86_64 +/* Store arguments for both -mregparm=3 and standard. */ env->regs[R_EAX] = sig; +__put_user(sig, >sig); env->regs[R_EDX] = frame_addr + offsetof(struct rt_sigframe, info); +__put_user(env->regs[R_EDX], >pinfo); env->regs[R_ECX] = frame_addr + offsetof(struct rt_sigframe, uc); +__put_user(env->regs[R_ECX], >puc); #else env->regs[R_EAX] = 0; env->regs[R_EDI] = sig; -- 2.34.1
[PATCH for-9.1 v2 00/28] linux-user/i386: Properly align signal frame
v1: https://lore.kernel.org/qemu-devel/20230524054647.1093758-1-richard.hender...@linaro.org/ But v1 isn't particularly complet or korrect. Disconnect fpstate from sigframe, just like the kernel does. Return the separate portions of the frame from get_sigframe. Alter all of the target fpu routines to access memory that has already been translated and sized. r~ Richard Henderson (28): target/i386: Add tcg/access.[ch] target/i386: Convert do_fldt, do_fstt to X86Access target/i386: Convert helper_{fbld,fbst}_ST0 to X86Access target/i386: Convert do_fldenv to X86Access target/i386: Convert do_fstenv to X86Access target/i386: Convert do_fsave, do_frstor to X86Access target/i386: Convert do_xsave_{fpu,mxcr,sse} to X86Access target/i386: Convert do_xrstor_{fpu,mxcr,sse} to X86Access tagret/i386: Convert do_fxsave, do_fxrstor to X86Access target/i386: Convert do_xsave_* to X86Access target/i386: Convert do_xrstor_* to X86Access target/i386: Split out do_xsave_chk target/i386: Add rbfm argument to cpu_x86_{xsave,xrstor} target/i386: Add {hw,sw}_reserved to X86LegacyXSaveArea linux-user/i386: Drop xfeatures_size from sigcontext arithmetic linux-user/i386: Remove xfeatures from target_fpstate_fxsave linux-user/i386: Replace target_fpstate_fxsave with X86LegacyXSaveArea linux-user/i386: Split out struct target_fregs_state linux-user/i386: Fix -mregparm=3 for signal delivery linux-user/i386: Return boolean success from restore_sigcontext linux-user/i386: Return boolean success from xrstor_sigcontext linux-user/i386: Fix allocation and alignment of fp state target/i386: Honor xfeatures in xrstor_sigcontext target/i386: Convert do_xsave to X86Access target/i386: Convert do_xrstor to X86Access target/i386: Pass host pointer and size to cpu_x86_{fsave,frstor} target/i386: Pass host pointer and size to cpu_x86_{fxsave,fxrstor} target/i386: Pass host pointer and size to cpu_x86_{xsave,xrstor} target/i386/cpu.h| 57 ++- target/i386/tcg/access.h | 40 ++ linux-user/i386/signal.c | 669 ++- target/i386/tcg/access.c | 160 target/i386/tcg/fpu_helper.c | 561 -- tests/tcg/x86_64/test-1648.c | 33 ++ target/i386/tcg/meson.build | 1 + tests/tcg/x86_64/Makefile.target | 1 + 8 files changed, 1014 insertions(+), 508 deletions(-) create mode 100644 target/i386/tcg/access.h create mode 100644 target/i386/tcg/access.c create mode 100644 tests/tcg/x86_64/test-1648.c -- 2.34.1
Re: [PATCH v2] vhost: don't set vring call if guest notifiers is not enabled
On Mon, Apr 8, 2024 at 3:33 PM lyx634449800 wrote: > > When conducting performance testing using testpmd in the guest os, > it was observed that the performance was lower compared to the > scenario of direct vfio-pci usage. > > In the commit 96a3d98d2cdbd897ff5ab33427aa4cfb94077665, the author > provided a good solution. However, because the guest OS's > driver(e.g., virtio-net pmd) may not enable the msix capability, the > function k->query_guest_notifiers(qbus->parent) may return false, > resulting in the expected effect not being achieved. To address this > issue, modify the conditional statement. > > Signed-off-by: Yuxue Liu Acked-by: Jason Wang Thanks
Re: [PATCH 1/2] virtio-net: Fix vhost virtqueue notifiers for RSS
On Mon, Apr 8, 2024 at 6:13 PM Michael S. Tsirkin wrote: > > On Tue, Mar 26, 2024 at 07:06:29PM +0900, Akihiko Odaki wrote: > > virtio_net_guest_notifier_pending() and virtio_net_guest_notifier_mask() > > checked VIRTIO_NET_F_MQ to know there are multiple queues, but > > VIRTIO_NET_F_RSS also enables multiple queues. Refer to n->multiqueue, > > which is set to true either of VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is > > enabled. > > > > Fixes: 68b0a6395f36 ("virtio-net: align ctrl_vq index for non-mq guest for > > vhost_vdpa") > > Signed-off-by: Akihiko Odaki > > Reviewed-by: Michael S. Tsirkin > > Jason, are you merging this? It has been merged: https://gitlab.com/qemu-project/qemu/-/commit/ba6bb2ec953f10751f174b6f7da8fe7e5f008c08 Thanks
RE: [PATCH v2 03/10] backends/iommufd: Introduce abstract HIODIOMMUFD device
Hi All, >-Original Message- >From: Duan, Zhenzhong >Subject: [PATCH v2 03/10] backends/iommufd: Introduce abstract >HIODIOMMUFD device > >HIODIOMMUFD represents a host IOMMU device under iommufd backend. > >Currently it includes only public iommufd handle and device id. >which could be used to get hw IOMMU information. > >When nested translation is supported in future, vIOMMU is going >to have iommufd related operations like attaching/detaching hwpt, >So IOMMUFDDevice interface will be further extended at that time. > >VFIO and VDPA device have different way of attaching/detaching hwpt. >So HIODIOMMUFD is still an abstract class which will be inherited by >VFIO and VDPA device. > >Introduce a helper hiod_iommufd_init() to initialize HIODIOMMUFD >device. > >Suggested-by: Cédric Le Goater >Originally-by: Yi Liu >Signed-off-by: Yi Sun >Signed-off-by: Zhenzhong Duan >--- > include/sysemu/iommufd.h | 22 +++ > backends/iommufd.c | 47 ++-- > 2 files changed, 53 insertions(+), 16 deletions(-) > >diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h >index 9af27ebd6c..71c53cbb45 100644 >--- a/include/sysemu/iommufd.h >+++ b/include/sysemu/iommufd.h >@@ -4,6 +4,7 @@ > #include "qom/object.h" > #include "exec/hwaddr.h" > #include "exec/cpu-common.h" >+#include "sysemu/host_iommu_device.h" > > #define TYPE_IOMMUFD_BACKEND "iommufd" > OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, >IOMMUFD_BACKEND) >@@ -33,4 +34,25 @@ int iommufd_backend_map_dma(IOMMUFDBackend >*be, uint32_t ioas_id, hwaddr iova, > ram_addr_t size, void *vaddr, bool readonly); > int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t >ioas_id, > hwaddr iova, ram_addr_t size); >+ >+#define TYPE_HIOD_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd" >+OBJECT_DECLARE_TYPE(HIODIOMMUFD, HIODIOMMUFDClass, >HIOD_IOMMUFD) >+ >+struct HIODIOMMUFD { >+/*< private >*/ >+HostIOMMUDevice parent; >+void *opaque; Please ignore above line "void *opaque;", it's totally useless, I forgot to remove it. Sorry for noise. Thanks Zhenzhong
[PATCH v9] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
The KVM_ARM_VCPU_PMU_V3_FILTER provides the ability to let the VMM decide which PMU events are provided to the guest. Add a new option `kvm-pmu-filter` as -cpu sub-option to set the PMU Event Filtering. Without the filter, all PMU events are exposed from host to guest by default. The usage of the new sub-option can be found from the updated document (docs/system/arm/cpu-features.rst). Here is an example which shows how to use the PMU Event Filtering, when we launch a guest by use kvm, add such command line: # qemu-system-aarch64 \ -accel kvm \ -cpu host,kvm-pmu-filter="D:0x11-0x11" Since the first action is deny, we have a global allow policy. This filters out the cycle counter (event 0x11 being CPU_CYCLES). And then in guest, use the perf to count the cycle: # perf stat sleep 1 Performance counter stats for 'sleep 1': 1.22 msec task-clock #0.001 CPUs utilized 1 context-switches # 820.695 /sec 0 cpu-migrations #0.000 /sec 55 page-faults # 45.138 K/sec cycles 1128954 instructions 227031 branches # 186.323 M/sec 8686 branch-misses#3.83% of all branches 1.002492480 seconds time elapsed 0.001752000 seconds user 0.0 seconds sys As we can see, the cycle counter has been disabled in the guest, but other pmu events do still work. Signed-off-by: Shaoqin Huang --- v8->v9: - Replace the warn_report to error_setg in some places. - Merge the check condition to make code more clean. - Try to use the QAPI format for the PMU Filter property but failed to use it since the -cpu option doesn't support json format yet. v7->v8: - Add qtest for kvm-pmu-filter. - Do the kvm-pmu-filter syntax checking up-front in the kvm_pmu_filter_set() function. And store the filter information at there. When kvm_pmu_filter_get() reconstitute it. v6->v7: - Check return value of sscanf. - Improve the check condition. v5->v6: - Commit message improvement. - Remove some unused code. - Collect Reviewed-by, thanks Sebastian. - Use g_auto(Gstrv) to replace the gchar **. [Eric] v4->v5: - Change the kvm-pmu-filter as a -cpu sub-option. [Eric] - Comment tweak. [Gavin] - Rebase to the latest branch. v3->v4: - Fix the wrong check for pmu_filter_init.[Sebastian] - Fix multiple alignment issue. [Gavin] - Report error by warn_report() instead of error_report(), and don't use abort() since the PMU Event Filter is an add-on and best-effort feature. [Gavin] - Add several missing { } for single line of code. [Gavin] - Use the g_strsplit() to replace strtok(). [Gavin] v2->v3: - Improve commits message, use kernel doc wording, add more explaination on filter example, fix some typo error.[Eric] - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric] - Add more precise error message report. [Eric] - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in KVM.[Eric] v1->v2: - Add more description for allow and deny meaning in commit message. [Sebastian] - Small improvement. [Sebastian] --- docs/system/arm/cpu-features.rst | 23 +++ target/arm/arm-qmp-cmds.c| 2 +- target/arm/cpu.h | 3 + target/arm/kvm.c | 112 +++ tests/qtest/arm-cpu-features.c | 51 ++ 5 files changed, 190 insertions(+), 1 deletion(-) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst index a5fb929243..f3930f34b3 100644 --- a/docs/system/arm/cpu-features.rst +++ b/docs/system/arm/cpu-features.rst @@ -204,6 +204,29 @@ the list of KVM VCPU features and their descriptions. the guest scheduler behavior and/or be exposed to the guest userspace. +``kvm-pmu-filter`` + By default kvm-pmu-filter is disabled. This means that by default all PMU + events will be exposed to guest. + + KVM implements PMU Event Filtering to prevent a guest from being able to + sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER + attribute supported in KVM. It has the following format: + + kvm-pmu-filter="{A,D}:start-end[;{A,D}:start-end...]" + + The A means "allow" and D means "deny", start is the first event of the + range and the end is the last one. The first registered range defines + the global policy (global ALLOW if the first action is DENY, global DENY + if the first action is ALLOW). The start and end only support
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Eric, On 3/19/24 23:23, Eric Auger wrote: +if (kvm_supports_pmu_filter) { +assert_set_feature_str(qts, "host", "kvm-pmu-filter", ""); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "A:0x11-0x11"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "D:0x11-0x11"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "A:0x11-0x11;A:0x12-0x20"); +assert_set_feature_str(qts, "host", "kvm-pmu-filter", + "D:0x11-0x11;A:0x12-0x20;D:0x12-0x15"); Just to double check this set the filter and checks the filter is applied, is that correct? I see you set some ranges of events. Are you sure those events are supported on host PMU and won't create a failure on setting the PMU filter? What I test here is that checking if the PMU Filter parser is right which I write in the kvm_pmu_filter_set/get function, I don't test any KVM side things like if the PMU event is supported by host. Thanks, Shaoqin Thanks Eric -- Shaoqin
Re: [PATCH v8] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER
Hi Kevin, On 4/2/24 21:01, Kevin Wolf wrote: Maybe I'm wrong. So I want to double check with if the -cpu option support json format nowadays? As far as I can see, -cpu doesn't support JSON yet. But even if it did, your command line would be invalid because the 'host,' part isn't JSON. Thanks for answering my question. I guess I should still keep the current implementation, and to transform the property in the future when the -cpu option support JSON format. Thanks, Shaoqin If the -cpu option doesn't support json format, how I can use the QAPI for kvm-pmu-filter property? This would probably mean QAPIfying all CPUs first, which sounds like a major effort. -- Shaoqin
Re: [PATCH-for-9.0?] backends/cryptodev: Do not abort for invalid session ID
Hi, VIRTIO_CRYPTO_INVSESS has a quite clear meaning: invalid session ID when executing crypto operations. Uplayer would get an explicit code once failing to close session, so I suggest no error log printing in this function. On 4/8/24 23:45, Philippe Mathieu-Daudé wrote: Instead of aborting when a session ID is invalid, report an error and return VIRTIO_CRYPTO_INVSESS ("Invalid session id"). Reproduced using: $ cat << EOF | qemu-system-i386 -display none \ -machine q35,accel=qtest -m 512M -nodefaults \ -object cryptodev-backend-builtin,id=cryptodev0 \ -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0 \ -qtest stdio outl 0xcf8 0x8804 outw 0xcfc 0x06 outl 0xcf8 0x8820 outl 0xcfc 0xe0008000 write 0x10800e 0x1 0x01 write 0xe0008016 0x1 0x01 write 0xe0008020 0x4 0x00801000 write 0xe0008028 0x4 0x00c01000 write 0xe000801c 0x1 0x01 write 0x11 0x1 0x05 write 0x110001 0x1 0x04 write 0x108002 0x1 0x11 write 0x108008 0x1 0x48 write 0x10800c 0x1 0x01 write 0x108018 0x1 0x10 write 0x10801c 0x1 0x02 write 0x10c002 0x1 0x01 write 0xe000b005 0x1 0x00 EOF Assertion failed: (session_id < MAX_NUM_SESSIONS && builtin->sessions[session_id]), function cryptodev_builtin_close_session, file cryptodev-builtin.c, line 430. Reported-by: Zheyu Ma Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2274 Signed-off-by: Philippe Mathieu-Daudé --- backends/cryptodev-builtin.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c index 39d0455280..3bbaabe86e 100644 --- a/backends/cryptodev-builtin.c +++ b/backends/cryptodev-builtin.c @@ -22,6 +22,7 @@ */ #include "qemu/osdep.h" +#include "qemu/error-report.h" #include "sysemu/cryptodev.h" #include "qapi/error.h" #include "standard-headers/linux/virtio_crypto.h" @@ -427,7 +428,10 @@ static int cryptodev_builtin_close_session( CRYPTODEV_BACKEND_BUILTIN(backend); CryptoDevBackendBuiltinSession *session; -assert(session_id < MAX_NUM_SESSIONS && builtin->sessions[session_id]); +if (session_id >= MAX_NUM_SESSIONS || !builtin->sessions[session_id]) { +error_report("Cannot find a valid session id: %" PRIu64 "", session_id); +return -VIRTIO_CRYPTO_INVSESS; +} session = builtin->sessions[session_id]; if (session->cipher) { -- zhenwei pi
Re:Re: [PATCH] hw/intc/riscv_aplic: APLICs should add child earlier than realize
At 2024-04-09 06:33:55, "Daniel Henrique Barboza" wrote: > > >On 4/7/24 00:46, yang.zhang wrote: >> From: "yang.zhang" >> >> Since only root APLICs can have hw IRQ lines, aplic->parent should >> be initialized first. > >I think it's worth mentioning that, if we don't do that, there won't be >an aplic->parent assigned during riscv_aplic_realize() and we won't create >the adequate IRQ lines. > >> >> Signed-off-by: yang.zhang >> --- > >Please add: > >Fixes: e8f79343cf ("hw/intc: Add RISC-V AIA APLIC device emulation") > > >And: > > >Reviewed-by: Daniel Henrique Barboza Done. Thanks. >> > > >> hw/intc/riscv_aplic.c | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c >> index fc5df0d598..32edd6d07b 100644 >> --- a/hw/intc/riscv_aplic.c >> +++ b/hw/intc/riscv_aplic.c >> @@ -1000,16 +1000,16 @@ DeviceState *riscv_aplic_create(hwaddr addr, hwaddr >> size, >> qdev_prop_set_bit(dev, "msimode", msimode); >> qdev_prop_set_bit(dev, "mmode", mmode); >> >> +if (parent) { >> +riscv_aplic_add_child(parent, dev); >> +} >> + >> sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal); >> >> if (!is_kvm_aia(msimode)) { >> sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr); >> } >> >> -if (parent) { >> -riscv_aplic_add_child(parent, dev); >> -} >> - >> if (!msimode) { >> for (i = 0; i < num_harts; i++) { >> CPUState *cpu = cpu_by_arch_id(hartid_base + i);
Re: [PATCH] Revert "hw/virtio: Add support for VDPA network simulation devices"
On Mon, Apr 8, 2024 at 5:47 PM Michael S. Tsirkin wrote: > > This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c. > > The patch adds non-upstream code in > include/standard-headers/linux/virtio_pci.h > which would make maintainance harder. > > Revert for now. > > Suggested-by: Jason Wang > Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Thanks
[PATCH] hw/intc/riscv_aplic: APLICs should add child earlier than realize
From: "yang.zhang" Since only root APLICs can have hw IRQ lines, aplic->parent should be initialized first. Fixes: e8f79343cf ("hw/intc: Add RISC-V AIA APLIC device emulation") Reviewed-by: Daniel Henrique Barboza Signed-off-by: yang.zhang --- hw/intc/riscv_aplic.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c index fc5df0d598..32edd6d07b 100644 --- a/hw/intc/riscv_aplic.c +++ b/hw/intc/riscv_aplic.c @@ -1000,16 +1000,16 @@ DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size, qdev_prop_set_bit(dev, "msimode", msimode); qdev_prop_set_bit(dev, "mmode", mmode); +if (parent) { +riscv_aplic_add_child(parent, dev); +} + sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal); if (!is_kvm_aia(msimode)) { sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr); } -if (parent) { -riscv_aplic_add_child(parent, dev); -} - if (!msimode) { for (i = 0; i < num_harts; i++) { CPUState *cpu = cpu_by_arch_id(hartid_base + i); -- 2.25.1
[PATCH for-9.0] linux-user: Preserve unswapped siginfo_t for strace
Passing the tswapped structure to strace means that our internal si_type is also gone, which then aborts in print_siginfo. Fixes: 4d6d8a05a0a ("linux-user: Move tswap_siginfo out of target code") Signed-off-by: Richard Henderson --- linux-user/signal.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/linux-user/signal.c b/linux-user/signal.c index a93148a4cb..05dc4afb52 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -1173,6 +1173,7 @@ static void handle_pending_signal(CPUArchState *cpu_env, int sig, CPUState *cpu = env_cpu(cpu_env); abi_ulong handler; sigset_t set; +target_siginfo_t unswapped; target_sigset_t target_old_set; struct target_sigaction *sa; TaskState *ts = get_task_state(cpu); @@ -1182,9 +1183,14 @@ static void handle_pending_signal(CPUArchState *cpu_env, int sig, k->pending = 0; /* - * Writes out siginfo values byteswapped, accordingly to the target. It also - * cleans the si_type from si_code making it correct for the target. + * Writes out siginfo values byteswapped, accordingly to the target. + * It also cleans the si_type from si_code making it correct for + * the target. We must hold on to the original unswapped copy for + * strace below, because si_type is still required there. */ +if (unlikely(qemu_loglevel_mask(LOG_STRACE))) { +unswapped = k->info; +} tswap_siginfo(>info, >info); sig = gdb_handlesig(cpu, sig, NULL, >info, sizeof(k->info)); @@ -1197,7 +1203,7 @@ static void handle_pending_signal(CPUArchState *cpu_env, int sig, } if (unlikely(qemu_loglevel_mask(LOG_STRACE))) { -print_taken_signal(sig, >info); +print_taken_signal(sig, ); } if (handler == TARGET_SIG_DFL) { -- 2.34.1
Re: [PATCH] hw/intc/riscv_aplic: APLICs should add child earlier than realize
On 4/7/24 00:46, yang.zhang wrote: From: "yang.zhang" Since only root APLICs can have hw IRQ lines, aplic->parent should be initialized first. I think it's worth mentioning that, if we don't do that, there won't be an aplic->parent assigned during riscv_aplic_realize() and we won't create the adequate IRQ lines. Signed-off-by: yang.zhang --- Please add: Fixes: e8f79343cf ("hw/intc: Add RISC-V AIA APLIC device emulation") And: Reviewed-by: Daniel Henrique Barboza hw/intc/riscv_aplic.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c index fc5df0d598..32edd6d07b 100644 --- a/hw/intc/riscv_aplic.c +++ b/hw/intc/riscv_aplic.c @@ -1000,16 +1000,16 @@ DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size, qdev_prop_set_bit(dev, "msimode", msimode); qdev_prop_set_bit(dev, "mmode", mmode); +if (parent) { +riscv_aplic_add_child(parent, dev); +} + sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal); if (!is_kvm_aia(msimode)) { sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr); } -if (parent) { -riscv_aplic_add_child(parent, dev); -} - if (!msimode) { for (i = 0; i < num_harts; i++) { CPUState *cpu = cpu_by_arch_id(hartid_base + i);
Re: [PATCH] Revert "hw/virtio: Add support for VDPA network simulation devices"
On Mon, Apr 08, 2024 at 10:11:18PM +0200, Paolo Bonzini wrote: > > > Il lun 8 apr 2024, 12:18 Michael S. Tsirkin ha scritto: > > On Mon, Apr 08, 2024 at 10:51:57AM +0100, Peter Maydell wrote: > > On Mon, 8 Apr 2024 at 10:48, Michael S. Tsirkin wrote: > > > > > > This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c. > > > > > > The patch adds non-upstream code in > > > include/standard-headers/linux/virtio_pci.h > > > which would make maintainance harder. > > > > > > Revert for now. > > > As long as it is part of the spec, why not just move the problematic parts to > a > QEMU specific header? As far as I understand the kernel is never going to > consume these constants anyway. > > Paolo I expect the contributor to do fixups like this though, not myself. > > > > Suggested-by: Jason Wang > > > Signed-off-by: Michael S. Tsirkin > > > > Are you intending to target this revert for 9.0 ? > > > > -- PMM > > Yes. > >
Re: [PATCH-for-9.0? 3/3] hw/block/nand: Fix out-of-bound access in NAND block buffer
On 8/4/24 18:39, Richard Henderson wrote: On 4/7/24 22:36, Philippe Mathieu-Daudé wrote: nand_command() and nand_getio() don't check @offset points into the block, nor the available data length (s->iolen) is not negative. In order to fix: - check the offset is in range in nand_blk_load_NAND_PAGE_SIZE(), - do not set @iolen if blk_load() failed. Do not set, or do not set to non-zero? I had been wondering if the Oh, "do not set to non-zero", thanks :) final assignment to s->iolen should go into nand_load_block as well... For the next tag I rather keep it this way which seems more explicit to me. Reviewed-by: Richard Henderson Thanks!
[RFC PATCH-for-9.1 3/4] hw/i2c: Convert to spec v7 terminology (automatically)
One of the biggest change from I2C spec v6 -> v7 is: • Updated the terms "master/slave" to "controller/target" Since it follows the inclusive terminology from the "Conscious Language in your Open Source Projects" guidelines [*], replace the I2C terminology. Mechanical transformation running: $ cat i2c_rename.txt | while read old new; do \ sed -i -e "s/$old/$new/g" $(git grep -l $old); \ done Having: $ cat i2c_rename.txt i2c_bus_master i2c_bus_controller i2c_schedule_pending_master i2c_schedule_pending_controller I2CPendingMasters I2CPendingControllers I2CPendingMaster I2CPendingController pending_masters pending_controllers I2C_SLAVE_CLASS I2C_TARGET_CLASS I2C_SLAVE_GET_CLASS I2C_TARGET_GET_CLASS I2CSlaveClass I2CTargetClass I2CSlave I2CTarget TYPE_I2C_SLAVE TYPE_I2C_TARGET I2C_SLAVE I2C_TARGET i2c_slave_new i2c_target_new i2c_slave_create_simple i2c_target_create_simple i2c_slave_realize_and_unref i2c_target_realize_and_unref i2c_slave_set_address i2c_target_set_address VMSTATE_I2C_SLAVE VMSTATE_I2C_TARGET vmstate_i2c_slave vmstate_i2c_target Note, the QOM type definition is not modified, TYPE_I2C_TARGET remains defined as "i2c-slave". [*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md Inspired-by: Wolfram Sang Signed-off-by: Philippe Mathieu-Daudé --- include/hw/display/i2c-ddc.h | 2 +- include/hw/gpio/pca9552.h| 2 +- include/hw/gpio/pca9554.h| 2 +- include/hw/i2c/aspeed_i2c.h | 4 +- include/hw/i2c/i2c.h | 66 - include/hw/i2c/i2c_mux_pca954x.h | 2 +- include/hw/i2c/smbus_slave.h | 4 +- include/hw/nvram/eeprom_at24c.h | 4 +- include/hw/sensor/tmp105.h | 2 +- hw/arm/aspeed.c | 232 +++ hw/arm/bananapi_m2u.c| 2 +- hw/arm/cubieboard.c | 2 +- hw/arm/musicpal.c| 6 +- hw/arm/npcm7xx_boards.c | 44 +++--- hw/arm/nseries.c | 6 +- hw/arm/pxa2xx.c | 36 ++--- hw/arm/realview.c| 2 +- hw/arm/spitz.c | 12 +- hw/arm/stellaris.c | 2 +- hw/arm/tosa.c| 14 +- hw/arm/versatilepb.c | 2 +- hw/arm/vexpress.c| 2 +- hw/arm/z2.c | 20 +-- hw/audio/wm8750.c| 18 +-- hw/display/ati.c | 4 +- hw/display/i2c-ddc.c | 10 +- hw/display/sii9022.c | 16 +-- hw/display/sm501.c | 2 +- hw/display/ssd0303.c | 14 +- hw/display/xlnx_dp.c | 2 +- hw/gpio/max7310.c| 14 +- hw/gpio/pca9552.c| 14 +- hw/gpio/pca9554.c| 14 +- hw/gpio/pcf8574.c| 12 +- hw/i2c/aspeed_i2c.c | 16 +-- hw/i2c/core.c| 88 ++-- hw/i2c/i2c_mux_pca954x.c | 6 +- hw/i2c/imx_i2c.c | 2 +- hw/i2c/smbus_slave.c | 12 +- hw/input/lm832x.c| 14 +- hw/misc/axp2xx.c | 14 +- hw/misc/i2c-echo.c | 14 +- hw/nvram/eeprom_at24c.c | 22 +-- hw/ppc/e500.c| 2 +- hw/ppc/pnv.c | 4 +- hw/ppc/sam460ex.c| 2 +- hw/rtc/ds1338.c | 14 +- hw/rtc/m41t80.c | 12 +- hw/rtc/twl92230.c| 16 +-- hw/sensor/dps310.c | 14 +- hw/sensor/emc141x.c | 16 +-- hw/sensor/lsm303dlhc_mag.c | 16 +-- hw/sensor/tmp105.c | 16 +-- hw/sensor/tmp421.c | 20 +-- hw/tpm/tpm_tis_i2c.c | 12 +- 55 files changed, 461 insertions(+), 461 deletions(-) diff --git a/include/hw/display/i2c-ddc.h b/include/hw/display/i2c-ddc.h index 94b5880587..faf3cd84fa 100644 --- a/include/hw/display/i2c-ddc.h +++ b/include/hw/display/i2c-ddc.h @@ -26,7 +26,7 @@ /* A simple I2C slave which just returns the contents of its EDID blob. */ struct I2CDDCState { /*< private >*/ -I2CSlave i2c; +I2CTarget i2c; /*< public >*/ bool firstbyte; uint8_t reg; diff --git a/include/hw/gpio/pca9552.h b/include/hw/gpio/pca9552.h index c36525f0c3..d7f07a44e0 100644 --- a/include/hw/gpio/pca9552.h +++ b/include/hw/gpio/pca9552.h @@ -23,7 +23,7 @@ DECLARE_INSTANCE_CHECKER(PCA955xState, PCA955X, struct PCA955xState { /*< private >*/ -I2CSlave i2c; +I2CTarget i2c; /*< public >*/ uint8_t len; diff --git a/include/hw/gpio/pca9554.h b/include/hw/gpio/pca9554.h index 54bfc4c4c7..0b528a0033 100644 --- a/include/hw/gpio/pca9554.h +++ b/include/hw/gpio/pca9554.h @@ -21,7 +21,7 @@ DECLARE_INSTANCE_CHECKER(PCA9554State, PCA9554, struct PCA9554State { /*< private >*/ -I2CSlave i2c; +I2CTarget i2c; /*< public >*/ uint8_t len; diff --git
[RFC PATCH-for-9.1 2/4] hw/i2c: Fix checkpatch line over 80 chars warnings
We are going to modify these lines, fix their style in order to avoid checkpatch.pl warnings: WARNING: line over 80 characters Signed-off-by: Philippe Mathieu-Daudé --- include/hw/i2c/i2c.h| 11 ++- include/hw/nvram/eeprom_at24c.h | 6 +- hw/arm/aspeed.c | 140 +++- hw/nvram/eeprom_at24c.c | 6 +- 4 files changed, 98 insertions(+), 65 deletions(-) diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h index c18a69e4b6..a1b3f4d179 100644 --- a/include/hw/i2c/i2c.h +++ b/include/hw/i2c/i2c.h @@ -31,7 +31,10 @@ struct I2CSlaveClass { /* Master to slave. Returns non-zero for a NAK, 0 for success. */ int (*send)(I2CSlave *s, uint8_t data); -/* Master to slave (asynchronous). Receiving slave must call i2c_ack(). */ +/* + * Master to slave (asynchronous). + * Receiving slave must call i2c_ack(). + */ void (*send_async)(I2CSlave *s, uint8_t data); /* @@ -83,7 +86,8 @@ struct I2CPendingMaster { }; typedef QLIST_HEAD(I2CNodeList, I2CNode) I2CNodeList; -typedef QSIMPLEQ_HEAD(I2CPendingMasters, I2CPendingMaster) I2CPendingMasters; +typedef QSIMPLEQ_HEAD(I2CPendingMasters, I2CPendingMaster) +I2CPendingMasters; struct I2CBus { BusState qbus; @@ -176,7 +180,8 @@ I2CSlave *i2c_slave_new(const char *name, uint8_t addr); * Create the device state structure, initialize it, put it on the * specified @bus, and drop the reference to it (the device is realized). */ -I2CSlave *i2c_slave_create_simple(I2CBus *bus, const char *name, uint8_t addr); +I2CSlave *i2c_slave_create_simple(I2CBus *bus, + const char *name, uint8_t addr); /** * Realize and drop a reference an I2C slave device diff --git a/include/hw/nvram/eeprom_at24c.h b/include/hw/nvram/eeprom_at24c.h index acb9857b2a..9d29f0a69a 100644 --- a/include/hw/nvram/eeprom_at24c.h +++ b/include/hw/nvram/eeprom_at24c.h @@ -33,7 +33,9 @@ I2CSlave *at24c_eeprom_init(I2CBus *bus, uint8_t address, uint32_t rom_size); * @bus, and drop the reference to it (the device is realized). Copies the data * from @init_rom to the beginning of the EEPROM memory buffer. */ -I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, uint8_t address, uint32_t rom_size, -const uint8_t *init_rom, uint32_t init_rom_size); +I2CSlave *at24c_eeprom_init_rom(I2CBus *bus, +uint8_t address, uint32_t rom_size, +const uint8_t *init_rom, +uint32_t init_rom_size); #endif diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 93ca87fda2..8279ad748a 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -649,18 +649,23 @@ static void witherspoon_bmc_i2c_init(AspeedMachineState *bmc) qdev_connect_gpio_out(dev, pca1_leds[i].gpio_id, qdev_get_gpio_in(DEVICE(led), 0)); } -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "dps310", 0x76); -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), "max31785", 0x52); -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 4), "tmp423", 0x4c); -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), "tmp423", 0x4c); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), +"dps310", 0x76); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 3), +"max31785", 0x52); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 4), +"tmp423", 0x4c); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 5), +"tmp423", 0x4c); /* The Witherspoon expects a TMP275 but a TMP105 is compatible */ -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), TYPE_TMP105, - 0x4a); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 9), +TYPE_TMP105, 0x4a); /* The witherspoon board expects Epson RX8900 I2C RTC but a ds1338 is * good enough */ -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), "ds1338", 0x32); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 11), +"ds1338", 0x32); smbus_eeprom_init_one(aspeed_i2c_get_bus(>i2c, 11), 0x51, eeprom_buf); @@ -717,19 +722,20 @@ static void fp5280g2_bmc_i2c_init(AspeedMachineState *bmc) at24c_eeprom_init(aspeed_i2c_get_bus(>i2c, 1), 0x50, 32768); /* The fp5280g2 expects a TMP112 but a TMP105 is compatible */ -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 2), TYPE_TMP105, - 0x48); -i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 2), TYPE_TMP105, - 0x49); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 2), +TYPE_TMP105, 0x48); +i2c_slave_create_simple(aspeed_i2c_get_bus(>i2c, 2), +TYPE_TMP105, 0x49); i2c_mux =
[RFC PATCH-for-9.1 0/4] hw/i2c: Convert to spec v7 (inclusive) terminology
Mechanical (mostly) conversion inspired by Wolfram [*] to use inclusive terminology, similarly to the other renames we did 3 years ago, shortly before the I2C spec v7 was published. Posted as RFC to get feedback, if no objection I plan to finish the conversion (SMBus and rest if hw/i2c/). [*] https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/ Philippe Mathieu-Daudé (4): hw/i2c: Fix checkpatch block comment warnings hw/i2c: Fix checkpatch line over 80 chars warnings hw/i2c: Convert to spec v7 terminology (automatically) hw/i2c: Convert to spec v7 terminology (manually) include/hw/display/i2c-ddc.h | 2 +- include/hw/gpio/pca9552.h| 2 +- include/hw/gpio/pca9554.h| 2 +- include/hw/i2c/aspeed_i2c.h | 4 +- include/hw/i2c/i2c.h | 123 ++--- include/hw/i2c/i2c_mux_pca954x.h | 2 +- include/hw/i2c/smbus_slave.h | 4 +- include/hw/nvram/eeprom_at24c.h | 8 +- include/hw/sensor/tmp105.h | 2 +- hw/arm/aspeed.c | 290 +-- hw/arm/bananapi_m2u.c| 2 +- hw/arm/cubieboard.c | 2 +- hw/arm/musicpal.c| 6 +- hw/arm/npcm7xx_boards.c | 44 ++--- hw/arm/nseries.c | 6 +- hw/arm/pxa2xx.c | 36 ++-- hw/arm/realview.c| 2 +- hw/arm/spitz.c | 12 +- hw/arm/stellaris.c | 2 +- hw/arm/tosa.c| 14 +- hw/arm/versatilepb.c | 2 +- hw/arm/vexpress.c| 2 +- hw/arm/z2.c | 20 +-- hw/audio/wm8750.c| 18 +- hw/display/ati.c | 4 +- hw/display/i2c-ddc.c | 10 +- hw/display/sii9022.c | 16 +- hw/display/sm501.c | 2 +- hw/display/ssd0303.c | 14 +- hw/display/xlnx_dp.c | 2 +- hw/gpio/max7310.c| 14 +- hw/gpio/pca9552.c| 14 +- hw/gpio/pca9554.c| 14 +- hw/gpio/pcf8574.c| 12 +- hw/i2c/aspeed_i2c.c | 16 +- hw/i2c/core.c| 90 +- hw/i2c/i2c_mux_pca954x.c | 6 +- hw/i2c/imx_i2c.c | 2 +- hw/i2c/smbus_slave.c | 12 +- hw/input/lm832x.c| 14 +- hw/misc/axp2xx.c | 14 +- hw/misc/i2c-echo.c | 14 +- hw/nvram/eeprom_at24c.c | 26 +-- hw/ppc/e500.c| 2 +- hw/ppc/pnv.c | 4 +- hw/ppc/sam460ex.c| 2 +- hw/rtc/ds1338.c | 14 +- hw/rtc/m41t80.c | 12 +- hw/rtc/twl92230.c| 16 +- hw/sensor/dps310.c | 14 +- hw/sensor/emc141x.c | 16 +- hw/sensor/lsm303dlhc_mag.c | 16 +- hw/sensor/tmp105.c | 16 +- hw/sensor/tmp421.c | 20 +-- hw/tpm/tpm_tis_i2c.c | 12 +- 55 files changed, 541 insertions(+), 506 deletions(-) -- 2.41.0
[RFC PATCH-for-9.1 1/4] hw/i2c: Fix checkpatch block comment warnings
We are going to modify these lines, fix their style in order to avoid checkpatch.pl warnings: WARNING: Block comments use a leading /* on a separate line WARNING: Block comments use * on subsequent lines WARNING: Block comments use a trailing */ on a separate line Signed-off-by: Philippe Mathieu-Daudé --- include/hw/i2c/i2c.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h index 2a3abacd1b..c18a69e4b6 100644 --- a/include/hw/i2c/i2c.h +++ b/include/hw/i2c/i2c.h @@ -4,10 +4,12 @@ #include "hw/qdev-core.h" #include "qom/object.h" -/* The QEMU I2C implementation only supports simple transfers that complete - immediately. It does not support slave devices that need to be able to - defer their response (eg. CPU slave interfaces where the data is supplied - by the device driver in response to an interrupt). */ +/* + * The QEMU I2C implementation only supports simple transfers that complete + * immediately. It does not support slave devices that need to be able to + * defer their response (eg. CPU slave interfaces where the data is supplied + * by the device driver in response to an interrupt). + */ enum i2c_event { I2C_START_RECV, -- 2.41.0
[RFC PATCH-for-9.1 4/4] hw/i2c: Convert to spec v7 terminology (manually)
See previous commit for rationale. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/i2c/i2c.h | 52 ++-- hw/i2c/core.c| 2 +- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/include/hw/i2c/i2c.h b/include/hw/i2c/i2c.h index fa00098477..abefee78fd 100644 --- a/include/hw/i2c/i2c.h +++ b/include/hw/i2c/i2c.h @@ -6,8 +6,8 @@ /* * The QEMU I2C implementation only supports simple transfers that complete - * immediately. It does not support slave devices that need to be able to - * defer their response (eg. CPU slave interfaces where the data is supplied + * immediately. It does not support target devices that need to be able to + * defer their response (eg. CPU target interfaces where the data is supplied * by the device driver in response to an interrupt). */ @@ -28,23 +28,23 @@ OBJECT_DECLARE_TYPE(I2CTarget, I2CTargetClass, struct I2CTargetClass { DeviceClass parent_class; -/* Master to slave. Returns non-zero for a NAK, 0 for success. */ +/* Controller to target. Returns non-zero for a NAK, 0 for success. */ int (*send)(I2CTarget *s, uint8_t data); /* - * Master to slave (asynchronous). - * Receiving slave must call i2c_ack(). + * Controller to target (asynchronous). + * Receiving target must call i2c_ack(). */ void (*send_async)(I2CTarget *s, uint8_t data); /* - * Slave to master. This cannot fail, the device should always + * Target to controller. This cannot fail, the device should always * return something here. */ uint8_t (*recv)(I2CTarget *s); /* - * Notify the slave of a bus state change. For start event, + * Notify the target of a bus state change. For start event, * returns non-zero to NAK an operation. For other events the * return code is not used and should be zero. */ @@ -96,7 +96,7 @@ struct I2CBus { uint8_t saved_address; bool broadcast; -/* Set from slave currently mastering the bus. */ +/* Set from target currently controlling the bus. */ QEMUBH *bh; }; @@ -107,7 +107,7 @@ int i2c_bus_busy(I2CBus *bus); * i2c_start_transfer: start a transfer on an I2C bus. * * @bus: #I2CBus to be used - * @address: address of the slave + * @address: address of the target * @is_recv: indicates the transfer direction * * When @is_recv is a known boolean constant, use the @@ -121,7 +121,7 @@ int i2c_start_transfer(I2CBus *bus, uint8_t address, bool is_recv); * i2c_start_recv: start a 'receive' transfer on an I2C bus. * * @bus: #I2CBus to be used - * @address: address of the slave + * @address: address of the target * * Returns: 0 on success, -1 on error */ @@ -131,7 +131,7 @@ int i2c_start_recv(I2CBus *bus, uint8_t address); * i2c_start_send: start a 'send' transfer on an I2C bus. * * @bus: #I2CBus to be used - * @address: address of the slave + * @address: address of the target * * Returns: 0 on success, -1 on error */ @@ -141,7 +141,7 @@ int i2c_start_send(I2CBus *bus, uint8_t address); * i2c_start_send_async: start an asynchronous 'send' transfer on an I2C bus. * * @bus: #I2CBus to be used - * @address: address of the slave + * @address: address of the target * * Return: 0 on success, -1 on error */ @@ -161,9 +161,9 @@ bool i2c_scan_bus(I2CBus *bus, uint8_t address, bool broadcast, I2CNodeList *current_devs); /** - * Create an I2C slave device on the heap. + * Create an I2C target device on the heap. * @name: a device type name - * @addr: I2C address of the slave when put on a bus + * @addr: I2C address of the target when put on a bus * * This only initializes the device state structure and allows * properties to be set. Type @name must exist. The device still @@ -172,10 +172,10 @@ bool i2c_scan_bus(I2CBus *bus, uint8_t address, bool broadcast, I2CTarget *i2c_target_new(const char *name, uint8_t addr); /** - * Create and realize an I2C slave device on the heap. + * Create and realize an I2C target device on the heap. * @bus: I2C bus to put it on - * @name: I2C slave device type name - * @addr: I2C address of the slave when put on a bus + * @name: I2C target device type name + * @addr: I2C address of the target when put on a bus * * Create the device state structure, initialize it, put it on the * specified @bus, and drop the reference to it (the device is realized). @@ -184,10 +184,10 @@ I2CTarget *i2c_target_create_simple(I2CBus *bus, const char *name, uint8_t addr); /** - * Realize and drop a reference an I2C slave device - * @dev: I2C slave device to realize + * Realize and drop a reference an I2C target device + * @dev: I2C target device to realize * @bus: I2C bus to put it on - * @addr: I2C address of the slave on the bus + * @addr: I2C address of the target on the bus * @errp: pointer to NULL initialized error object * *
[PATCH v3] e1000: Convert debug macros into tracepoints.
The E1000 debug messages are very useful for developing drivers. Make these available to users without recompiling QEMU. Signed-off-by: Austin Clements [geo...@ldpreload.com: Rebased on top of 2.9.0] Signed-off-by: Geoffrey Thomas Signed-off-by: Don Porter Reviewed-by: Richard Henderson --- hw/net/e1000.c | 90 +++-- hw/net/trace-events | 25 - 2 files changed, 54 insertions(+), 61 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index 43f3a4a701..24475636a3 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -44,26 +44,6 @@ #include "trace.h" #include "qom/object.h" -/* #define E1000_DEBUG */ - -#ifdef E1000_DEBUG -enum { -DEBUG_GENERAL, DEBUG_IO, DEBUG_MMIO, DEBUG_INTERRUPT, -DEBUG_RX, DEBUG_TX, DEBUG_MDIC, DEBUG_EEPROM, -DEBUG_UNKNOWN, DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR, -DEBUG_RXFILTER, DEBUG_PHY, DEBUG_NOTYET, -}; -#define DBGBIT(x)(1mac_reg[IMS]); +trace_e1000_set_ics(val, s->mac_reg[ICR], s->mac_reg[IMS]); set_interrupt_cause(s, 0, val | s->mac_reg[ICR]); } @@ -425,8 +404,7 @@ set_rx_control(E1000State *s, int index, uint32_t val) s->mac_reg[RCTL] = val; s->rxbuf_size = e1000x_rxbufsize(val); s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1; -DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT], - s->mac_reg[RCTL]); +trace_e1000_set_rx_control(s->mac_reg[RDT], s->mac_reg[RCTL]); timer_mod(s->flush_queue_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000); } @@ -440,16 +418,16 @@ set_mdic(E1000State *s, int index, uint32_t val) if ((val & E1000_MDIC_PHY_MASK) >> E1000_MDIC_PHY_SHIFT != 1) // phy # val = s->mac_reg[MDIC] | E1000_MDIC_ERROR; else if (val & E1000_MDIC_OP_READ) { -DBGOUT(MDIC, "MDIC read reg 0x%x\n", addr); +trace_e1000_mdic_read_register(addr); if (!(phy_regcap[addr] & PHY_R)) { -DBGOUT(MDIC, "MDIC read reg %x unhandled\n", addr); +trace_e1000_mdic_read_register_unhandled(addr); val |= E1000_MDIC_ERROR; } else val = (val ^ data) | s->phy_reg[addr]; } else if (val & E1000_MDIC_OP_WRITE) { -DBGOUT(MDIC, "MDIC write reg 0x%x, value 0x%x\n", addr, data); +trace_e1000_mdic_write_register(addr, data); if (!(phy_regcap[addr] & PHY_W)) { -DBGOUT(MDIC, "MDIC write reg %x unhandled\n", addr); +trace_e1000_mdic_write_register_unhandled(addr); val |= E1000_MDIC_ERROR; } else { if (addr < NPHYWRITEOPS && phyreg_writeops[addr]) { @@ -471,8 +449,8 @@ get_eecd(E1000State *s, int index) { uint32_t ret = E1000_EECD_PRES|E1000_EECD_GNT | s->eecd_state.old_eecd; -DBGOUT(EEPROM, "reading eeprom bit %d (reading %d)\n", - s->eecd_state.bitnum_out, s->eecd_state.reading); +trace_e1000_get_eecd(s->eecd_state.bitnum_out, s->eecd_state.reading); + if (!s->eecd_state.reading || ((s->eeprom_data[(s->eecd_state.bitnum_out >> 4) & 0x3f] >> ((s->eecd_state.bitnum_out & 0xf) ^ 0xf))) & 1) @@ -511,9 +489,8 @@ set_eecd(E1000State *s, int index, uint32_t val) s->eecd_state.reading = (((s->eecd_state.val_in >> 6) & 7) == EEPROM_READ_OPCODE_MICROWIRE); } -DBGOUT(EEPROM, "eeprom bitnum in %d out %d, reading %d\n", - s->eecd_state.bitnum_in, s->eecd_state.bitnum_out, - s->eecd_state.reading); +trace_e1000_set_eecd(s->eecd_state.bitnum_in, s->eecd_state.bitnum_out, + s->eecd_state.reading); } static uint32_t @@ -580,8 +557,7 @@ xmit_seg(E1000State *s) if (tp->cptse) { css = props->ipcss; -DBGOUT(TXSUM, "frames %d size %d ipcss %d\n", - frames, tp->size, css); +trace_e1000_xmit_seg1(frames, tp->size, css); if (props->ip) {/* IPv4 */ stw_be_p(tp->data+css+2, tp->size - css); stw_be_p(tp->data+css+4, @@ -591,7 +567,7 @@ xmit_seg(E1000State *s) } css = props->tucss; len = tp->size - css; -DBGOUT(TXSUM, "tcp %d tucss %d len %d\n", props->tcp, css, len); +trace_e1000_xmit_seg2(props->tcp, css, len); if (props->tcp) { sofar = frames * props->mss; stl_be_p(tp->data+css+4, ldl_be_p(tp->data+css+4)+sofar); /* seq */ @@ -759,7 +735,7 @@ start_xmit(E1000State *s) uint32_t tdh_start = s->mac_reg[TDH], cause = E1000_ICS_TXQE; if (!(s->mac_reg[TCTL] & E1000_TCTL_EN)) { -DBGOUT(TX, "tx disabled\n"); +trace_e1000_start_xmit_fail1(); return; } @@ -773,9 +749,9 @@ start_xmit(E1000State *s) sizeof(struct e1000_tx_desc) * s->mac_reg[TDH]; pci_dma_read(d, base, , sizeof(desc)); -DBGOUT(TX, "index %d: %p : %x %x\n",
Re: [PATCH v2] e1000: Convert debug macros into tracepoints.
On 4/3/24 2:44 PM, Austin Clements wrote: At this point there's not much of my original code left. :D Don, you're welcome to take the credit in the commit. Thanks Austin. I'll send v3 with this change :) BTW, my attempt to include the appropriate maintainer from scripts/get_maintainer.pl (jasonw...@redhat.com) bounced. Any pointers on who else should be cc-ed are appreciated. -dp
Re: [PATCH] target/i386: fix direction of "32-bit MMU" test
08.04.2024 23:12, Paolo Bonzini wrote: Il ven 5 apr 2024, 19:30 Michael Tokarev mailto:m...@tls.msk.ru>> ha scritto: It sigsegvs in probe_access_internal(): CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns NULL, and next there's a call tlb_addr = tlb_read_ofs(entry, elt_ofs); which fails. I will take a look tomorrow. The changes on top of 7.2.10 are available at https://gitlab.com/mjt0k/qemu/-/commits/7.2-i386-mmu-idx/ - I'm still blaming myself for bad back-port, but I can't find where I failed. Thanks, /mjt
Re: [PATCH] target/i386: fix direction of "32-bit MMU" test
Il ven 5 apr 2024, 19:30 Michael Tokarev ha scritto: > 01.04.2024 09:02, Michael Tokarev: > > > Anyone can guess why this rather trivial and obviously correct patch > causes segfaults > > in a few tests in staging-7.2 - when run in tcg mode, namely: > > > >pxe-test > >migration-test > >boot-serial-test > >bios-tables-test > >vmgenid-test > >cdrom-test > > > > When reverting this single commit from staging-7.2, it all works fine > again. > > It sigsegvs in probe_access_internal(): > >CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- this one returns > NULL, > > and next there's a call > >tlb_addr = tlb_read_ofs(entry, elt_ofs); > > which fails. > I will take a look tomorrow. Paolo > #0 0x55c5de8a in tlb_read_ofs (ofs=8, entry=0x0) at > 7.2/accel/tcg/cputlb.c:1455 > #1 probe_access_internal > (env=0x56a862a0, addr=4294967280, fault_size=fault_size@entry=1, > access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=5, > nonfault=nonfault@entry=false, phost=0x7fffea4d32a0, > pfull=0x7fffea4d3298, retaddr=0) > at 7.2/accel/tcg/cputlb.c:1555 > #2 0x55c62aba in get_page_addr_code_hostp > (env=, addr=addr@entry=4294967280, hostp=hostp@entry > =0x0) > at 7.2/accel/tcg/cputlb.c:1691 > #3 0x55c52b54 in get_page_addr_code (addr=4294967280, > env=) > at 7.2/include/exec/exec-all.h:714 > #4 tb_htable_lookup > (cpu=cpu@entry=0x56a85530, pc=pc@entry=4294967280, > cs_base=cs_base@entry=4294901760, flags=flags@entry=64, > cflags=cflags@entry=4278190080) at > 7.2/accel/tcg/cpu-exec.c:236 > #5 0x55c53e8e in tb_lookup > (cflags=4278190080, flags=64, cs_base=4294901760, pc=4294967280, > cpu=0x56a85530) > at 7.2/accel/tcg/cpu-exec.c:270 > #6 cpu_exec (cpu=cpu@entry=0x56a85530) at > 7.2/accel/tcg/cpu-exec.c:1001 > #7 0x55c75d2f in tcg_cpus_exec (cpu=cpu@entry=0x56a85530) > at 7.2/accel/tcg/tcg-accel-ops.c:69 > #8 0x55c75e80 in mttcg_cpu_thread_fn (arg=arg@entry > =0x56a85530) > at 7.2/accel/tcg/tcg-accel-ops-mttcg.c:95 > #9 0x55ded098 in qemu_thread_start (args=0x56adac40) > at 7.2/util/qemu-thread-posix.c:505 > #10 0x75793134 in start_thread (arg=) > #11 0x758137dc in clone3 () > > > I'm removing this whole set from 7.2 for now: > > 2cc68629a6fc target/i386: fix direction of "32-bit MMU" test > 90f641531c78 target/i386: use separate MMU indexes for 32-bit accesses > 5f97afe2543f target/i386: introduce function to query MMU indices > > This leaves us with > > b1661801c184 "target/i386: Fix physical address truncation" > > but without its fix, 2cc68629a6fc. > > It looks like I should revert b1661801c184 from 7.2 too, re-opening > https://gitlab.com/qemu-project/qemu/-/issues/2040 - since to me it isn't > clear if this change actually fixes this issue or not without the > previous change, 90f641531c78, which is missing from 7.2.10. > > At the very least this will simplify possible another attempt to > cherry-pick > these changes to 7.2. > > Thanks, > > /mjt > >
Re: [PATCH] Revert "hw/virtio: Add support for VDPA network simulation devices"
Il lun 8 apr 2024, 12:18 Michael S. Tsirkin ha scritto: > On Mon, Apr 08, 2024 at 10:51:57AM +0100, Peter Maydell wrote: > > On Mon, 8 Apr 2024 at 10:48, Michael S. Tsirkin wrote: > > > > > > This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c. > > > > > > The patch adds non-upstream code in > > > include/standard-headers/linux/virtio_pci.h > > > which would make maintainance harder. > > > > > > Revert for now. > As long as it is part of the spec, why not just move the problematic parts to a QEMU specific header? As far as I understand the kernel is never going to consume these constants anyway. Paolo > > Suggested-by: Jason Wang > > > Signed-off-by: Michael S. Tsirkin > > > > Are you intending to target this revert for 9.0 ? > > > > -- PMM > > Yes. > >
Re: [PATCH 2/2] Call args->connect_channels to actually test multifd_tcp_channels_none qtest
On 08/04/24 9:10 pm, Peter Xu wrote: !---| CAUTION: External Email |---! On Sun, Apr 07, 2024 at 01:21:25PM +, Het Gala wrote: Earlier, without args->connect_channels, multifd_tcp_channels_none would call uri internally even though connect_channels was introduced in function definition. To actually call 'migrate' QAPI with modified syntax, args->connect_channels need to be passed. Double free happens while setting correct migration ports. Fix that. Fixes: (tests/qtest/migration: Add multifd_tcp_plain test using list of channels instead of uri) [1] Signed-off-by: Het Gala --- tests/qtest/migration-helpers.c | 2 -- tests/qtest/migration-test.c| 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c index b2a90469fb..b1d06187ab 100644 --- a/tests/qtest/migration-helpers.c +++ b/tests/qtest/migration-helpers.c @@ -146,8 +146,6 @@ static void migrate_set_ports(QTestState *to, QList *channel_list) qdict_put_str(addrdict, "port", addr_port); } } - -qobject_unref(addr); Firstly, this doesn't belong to the commit you were pointing at above [1]. Instead this line is part of: tests/qtest/migration: Add migrate_set_ports into migrate_qmp to update migration port value You may want to split them? Ack Side note: I didn't review carefully on the whole patchset, but I think it's preferred to not include any dead code like what you did with "tests/qtest/migration: Add migrate_set_ports into migrate_qmp to update migration port value". It'll be better to me if we introduce code that will be used already otherwise reviewing such patch is a pain, same to when we follow up stuff later like this. Yes Peter. My intention was to have the code which could actually take the benefit of using 'channels' for the new QAPI syntax. But somehow I missed adding connect_channels in the code, despite that the test passed because it generated connect_uri with the help of listen_uri inside migrate_qmp. And it generated migrate QMP command using old syntax. Also because it never entered migrate_set_ports, couldn't catch double free issue while manual testing as well as while the CI/CD pipeline was run. More importantly.. why free? I'll paste whole thing over, and raise my questions. static void migrate_set_ports(QTestState *to, QList *channel_list) { QDict *addr; QListEntry *entry; g_autofree const char *addr_port = NULL; <- this points to sub-field of "addr", if we free "addr", why autofree here? addr = migrate_get_connect_qdict(to); QLIST_FOREACH_ENTRY(channel_list, entry) { QDict *channel = qobject_to(QDict, qlist_entry_obj(entry)); QDict *addrdict = qdict_get_qdict(channel, "addr"); if (qdict_haskey(addrdict, "port") && qdict_haskey(addr, "port") && (strcmp(qdict_get_str(addrdict, "port"), "0") == 0)) { addr_port = qdict_get_str(addr, "port"); qdict_put_str(addrdict, "port", addr_port); <- shouldn't we g_strdup() instead of dropping the below unref()? } } qobject_unref(addr); } Yes, I got your point Peter. Will update in the new patch. } bool migrate_watch_for_events(QTestState *who, const char *name, diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 584d7c496f..5d6d8cd634 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1728,7 +1728,7 @@ static void test_precopy_common(MigrateCommon *args) goto finish; } -migrate_qmp(from, to, args->connect_uri, NULL, "{}"); +migrate_qmp(from, to, args->connect_uri, args->connect_channels, "{}"); if (args->result != MIG_TEST_SUCCEED) { bool allow_active = args->result == MIG_TEST_FAIL; -- 2.22.3 Regards, Het Gala
Re: Point where target instructions are read
On Thu, Apr 4, 2024 at 2:23 PM Peter Maydell wrote: > This will not work (yet) -- CPUs do not get reset as part of the > whole-system three-phase-reset, so using the exit phase method > is not sufficient to avoid the reset ordering problem here. > > You need to use rom_ptr_for_as() to see if there's a ROM blob > at the address you're trying to load the PC from, and if there > is you use ldl_p() to get the PC from the blob; otherwise you > use ldl_phys(). Searching for "initial_pc" in target/arm/cpu.c > will find you the code that does this for M-profile. Thanks for the tip. I am able to see the program being loaded based on the dump of rom pointer in gdb. Now the problem is I am loading a binary file (msp430-elf-objcopy -O binary simple_test simple_test.bin) and due to this I will be missing out the loader loading different sections in the right parts of the memory. The reset vector which is supposed to be present at 0xFFFE is present at 0x3FFE in the binary file. How can I fix this? Should I revert back to elf file loading? -Gautam.
[PULL 0/3] 9.0 bugfixes for 2024-04-08
The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5: Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to e34f4d87e8d47b0a65cb663aaf7bef60c2112d36: kvm: error out of kvm_irqchip_add_msi_route() in case of full route table (2024-04-08 21:22:00 +0200) * fall back to non-ioeventfd notification if KVM routing table is full * support kitware ninja with jobserver support * nanomips: fix warnings with GCC 14 Igor Mammedov (1): kvm: error out of kvm_irqchip_add_msi_route() in case of full route table Martin Hundebøll (1): Makefile: preserve --jobserver-auth argument when calling ninja Paolo Bonzini (1): nanomips: fix warnings with GCC 14 Makefile| 2 +- accel/kvm/kvm-all.c | 15 ++-- disas/nanomips.c| 194 ++-- 3 files changed, 108 insertions(+), 103 deletions(-) -- 2.44.0
[PULL 2/3] nanomips: fix warnings with GCC 14
GCC 14 shows -Wshadow=local warnings if an enum conflicts with a local variable (including a parameter). To avoid this, move the problematic enum and all of its dependencies after the hundreds of functions that have a parameter named "instruction". Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- disas/nanomips.c | 194 +++ 1 file changed, 97 insertions(+), 97 deletions(-) diff --git a/disas/nanomips.c b/disas/nanomips.c index a0253598dd6..db0c297b8dc 100644 --- a/disas/nanomips.c +++ b/disas/nanomips.c @@ -36,35 +36,6 @@ typedef uint32_t uint32; typedef uint16_t uint16; typedef uint64_t img_address; -typedef enum { -instruction, -call_instruction, -branch_instruction, -return_instruction, -reserved_block, -pool, -} TABLE_ENTRY_TYPE; - -typedef enum { -MIPS64_= 0x0001, -XNP_ = 0x0002, -XMMS_ = 0x0004, -EVA_ = 0x0008, -DSP_ = 0x0010, -MT_= 0x0020, -EJTAG_ = 0x0040, -TLBINV_= 0x0080, -CP0_ = 0x0100, -CP1_ = 0x0200, -CP2_ = 0x0400, -UDI_ = 0x0800, -MCU_ = 0x1000, -VZ_= 0x2000, -TLB_ = 0x4000, -MVH_ = 0x8000, -ALL_ATTRIBUTES = 0xull, -} TABLE_ATTRIBUTE_TYPE; - typedef struct Dis_info { img_address m_pc; fprintf_function fprintf_func; @@ -72,22 +43,6 @@ typedef struct Dis_info { sigjmp_buf buf; } Dis_info; -typedef bool (*conditional_function)(uint64 instruction); -typedef char * (*disassembly_function)(uint64 instruction, -Dis_info *info); - -typedef struct Pool { -TABLE_ENTRY_TYPE type; -const struct Pool*next_table; -int next_table_size; -int instructions_size; -uint64 mask; -uint64 value; -disassembly_function disassembly; -conditional_function condition; -uint64 attributes; -} Pool; - #define IMGASSERTONCE(test) @@ -544,58 +499,6 @@ static uint64 extract_op_code_value(const uint16 *data, int size) } -/* - * Recurse through tables until the instruction is found then return - * the string and size - * - * inputs: - * pointer to a word stream, - * disassember table and size - * returns: - * instruction size- negative is error - * disassembly string - on error will constain error string - */ -static int Disassemble(const uint16 *data, char **dis, - TABLE_ENTRY_TYPE *type, const Pool *table, - int table_size, Dis_info *info) -{ -for (int i = 0; i < table_size; i++) { -uint64 op_code = extract_op_code_value(data, - table[i].instructions_size); -if ((op_code & table[i].mask) == table[i].value) { -/* possible match */ -conditional_function cond = table[i].condition; -if ((cond == NULL) || cond(op_code)) { -if (table[i].type == pool) { -return Disassemble(data, dis, type, - table[i].next_table, - table[i].next_table_size, - info); -} else if ((table[i].type == instruction) || - (table[i].type == call_instruction) || - (table[i].type == branch_instruction) || - (table[i].type == return_instruction)) { -disassembly_function dis_fn = table[i].disassembly; -if (dis_fn == 0) { -*dis = g_strdup( -"disassembler failure - bad table entry"); -return -6; -} -*type = table[i].type; -*dis = dis_fn(op_code, info); -return table[i].instructions_size; -} else { -*dis = g_strdup("reserved instruction"); -return -2; -} -} -} -} -*dis = g_strdup("failed to disassemble"); -return -1; /* failed to disassemble*/ -} - - static uint64 extract_code_18_to_0(uint64 instruction) { uint64 value = 0; @@ -16213,6 +16116,51 @@ static char *YIELD(uint64 instruction, Dis_info *info) * */ +typedef enum { +instruction, +call_instruction, +branch_instruction, +return_instruction, +reserved_block, +pool, +} TABLE_ENTRY_TYPE; + +typedef enum { +MIPS64_= 0x0001, +XNP_ = 0x0002, +XMMS_ = 0x0004, +EVA_ = 0x0008, +DSP_ = 0x0010, +MT_= 0x0020, +EJTAG_ = 0x0040, +TLBINV_= 0x0080, +CP0_ = 0x0100, +
[PULL 3/3] kvm: error out of kvm_irqchip_add_msi_route() in case of full route table
From: Igor Mammedov subj is calling kvm_add_routing_entry() which simply extends KVMState::irq_routes::entries[] but doesn't check if number of routes goes beyond limit the kernel is willing to accept. Which later leads toi the assert qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed typically it happens during guest boot for large enough guest Reproduced with: ./qemu --enable-kvm -m 8G -smp 64 -machine pc \ `for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b "; for i in {0..31}; do touch /tmp/vblk$b$i; echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw -device virtio-blk-pci,drive=drive$b$i,bus=pci$b "; done; done` While crash at boot time is bad, the same might happen at hotplug time which is unacceptable. So instead calling kvm_add_routing_entry() unconditionally, check first that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio device insteads killin qemu, will gracefully fail to initialize device as expected with following warnings on console: virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set. virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). Signed-off-by: Igor Mammedov Message-ID: <20240408110956.451558-1-imamm...@redhat.com> Signed-off-by: Paolo Bonzini --- accel/kvm/kvm-all.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index a8cecd040eb..931f74256e8 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1999,12 +1999,17 @@ int kvm_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev) return -EINVAL; } -trace_kvm_irqchip_add_msi_route(dev ? dev->name : (char *)"N/A", -vector, virq); +if (s->irq_routes->nr < s->gsi_count) { +trace_kvm_irqchip_add_msi_route(dev ? dev->name : (char *)"N/A", +vector, virq); -kvm_add_routing_entry(s, ); -kvm_arch_add_msi_route_post(, vector, dev); -c->changes++; +kvm_add_routing_entry(s, ); +kvm_arch_add_msi_route_post(, vector, dev); +c->changes++; +} else { +kvm_irqchip_release_virq(s, virq); +return -ENOSPC; +} return virq; } -- 2.44.0
[PULL 1/3] Makefile: preserve --jobserver-auth argument when calling ninja
From: Martin Hundebøll Qemu wraps its call to ninja in a Makefile. Since ninja, as opposed to make, utilizes all CPU cores by default, the qemu Makefile translates the absense of a `-jN` argument into `-j1`. This breaks jobserver functionality, so update the -jN mangling to take the --jobserver-auth argument into considerationa too. Signed-off-by: Martin Hundebøll Message-Id: <20240402081738.1051560-1-mar...@geanix.com> Signed-off-by: Paolo Bonzini --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 8f369903356..183756018ff 100644 --- a/Makefile +++ b/Makefile @@ -142,7 +142,7 @@ MAKE.k = $(findstring k,$(firstword $(filter-out --%,$(MAKEFLAGS MAKE.q = $(findstring q,$(firstword $(filter-out --%,$(MAKEFLAGS MAKE.nq = $(if $(word 2, $(MAKE.n) $(MAKE.q)),nq) NINJAFLAGS = $(if $V,-v) $(if $(MAKE.n), -n) $(if $(MAKE.k), -k0) \ -$(filter-out -j, $(lastword -j1 $(filter -l% -j%, $(MAKEFLAGS \ +$(or $(filter -l% -j%, $(MAKEFLAGS)), $(if $(filter --jobserver-auth=%, $(MAKEFLAGS)),, -j1)) \ -d keepdepfile ninja-cmd-goals = $(or $(MAKECMDGOALS), all) ninja-cmd-goals += $(foreach g, $(MAKECMDGOALS), $(.ninja-goals.$g)) -- 2.44.0
Re: [PATCH 1/2] Fix typo to allow migrate_qmp_fail command with 'channels' argument
Het, It's all fine, no worries! This is good enough. Let's finish the discussion in the next patch before a repost. Thanks, On Mon, Apr 8, 2024, 2:35 p.m. Het Gala wrote: > > On 08/04/24 9:05 pm, Peter Xu wrote: > > !---| > CAUTION: External Email > > |---! > > Hey, Het, > > On Sun, Apr 07, 2024 at 01:21:24PM +, Het Gala wrote: > > Fixes: (tests/qtest/migration: Add negative tests to validate migration QAPIs) > > > I think I get your intention to provide two fixup patches on top of > migration-next, which indeed would be preferred so that I can squash them > into the patches before the pull. > > However please next time use "git commit --fixup" so that a better subject > will be generated, and that'll make my life (and Fabiano's I suppose in the > future) easier because git rebase understand those subjects. Then you > don't need Fixes with an empty commit ID. They'll start with "fixup: XXX" > pointing to a commit with subject rather than commit IDs. > > I apologize for any inconvenience caused by not using "git commit --fixup" > in my previous submission. Let me resend the patchset with correct message > convention. Will take care of this in future patches too, thanks for > bringing it to my notice. Regards, Het Gala >
Re: [PATCH 1/2] Fix typo to allow migrate_qmp_fail command with 'channels' argument
On 08/04/24 9:05 pm, Peter Xu wrote: !---| CAUTION: External Email |---! Hey, Het, On Sun, Apr 07, 2024 at 01:21:24PM +, Het Gala wrote: Fixes: (tests/qtest/migration: Add negative tests to validate migration QAPIs) I think I get your intention to provide two fixup patches on top of migration-next, which indeed would be preferred so that I can squash them into the patches before the pull. However please next time use "git commit --fixup" so that a better subject will be generated, and that'll make my life (and Fabiano's I suppose in the future) easier because git rebase understand those subjects. Then you don't need Fixes with an empty commit ID. They'll start with "fixup: XXX" pointing to a commit with subject rather than commit IDs. I apologize for any inconvenience caused by not using "git commit --fixup" in my previous submission. Let me resend the patchset with correct message convention. Will take care of this in future patches too, thanks for bringing it to my notice. Regards, Het Gala
Re: [PATCH-for-9.0 3/4] hw/char/virtio-serial-bus: Protect from DMA re-entrancy bugs
On 8/4/24 17:20, Michael S. Tsirkin wrote: On Mon, Apr 08, 2024 at 01:04:11PM +0200, Philippe Mathieu-Daudé wrote: On 8/4/24 12:08, Michael S. Tsirkin wrote: On Mon, Apr 08, 2024 at 09:14:39AM +0200, Philippe Mathieu-Daudé wrote: On 4/4/24 21:13, Philippe Mathieu-Daudé wrote: Replace qemu_bh_new_guarded() by virtio_bh_new_guarded() so the bus and device use the same guard. Otherwise the DMA-reentrancy protection can be bypassed. Cc: qemu-sta...@nongnu.org Suggested-by: Alexander Bulekov Signed-off-by: Philippe Mathieu-Daudé --- hw/char/virtio-serial-bus.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 016aba6374..cd0e3a11f7 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -985,8 +985,7 @@ static void virtser_port_device_realize(DeviceState *dev, Error **errp) return; } -port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port, - >mem_reentrancy_guard); +port->bh = virtio_bh_new_guarded(vdev, flush_queued_data_bh, port); Missing: -- >8 -- -port->bh = virtio_bh_new_guarded(vdev, flush_queued_data_bh, port); +port->bh = virtio_bh_new_guarded(VIRTIO_DEVICE(dev), + flush_queued_data_bh, port); --- I don't get it. vdev is already the correct type. Why do you need VIRTIO_DEVICE here? This function doesn't declare vdev. port->elem = NULL; } But it seems clear it wasn't really tested, right? Indeed, I only tested virtio-gpu, then added the other ones Alexander suggested. I don't have virtio-specific tests, I rely on the GitLab CI ones. Hope that's enough. Philipe here's my ack: Acked-by: Michael S. Tsirkin Feel free to merge these after testing. Sure, thanks! Phil.
Re: [PATCH v2 14/18] memory-device: move stubs out of stubs/
On 8/4/24 17:53, Paolo Bonzini wrote: Since the memory-device stubs are needed exactly when the Kconfig symbols are not needed, move them to hw/mem/. Signed-off-by: Paolo Bonzini --- stubs/memory_device.c => hw/mem/memory-device-stubs.c | 0 hw/mem/meson.build| 1 + stubs/meson.build | 1 - 3 files changed, 1 insertion(+), 1 deletion(-) rename stubs/memory_device.c => hw/mem/memory-device-stubs.c (100%) Reviewed-by: Philippe Mathieu-Daudé
[PULL 32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants
From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amona...@ispras.ru> --- util/bufferiszero.c | 111 +--- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 00118d649e..02df82b4ff 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { -__m128i t = _mm_loadu_si128(buf); -__m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); -__m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); -__m128i zero = _mm_setzero_si128(); +/* Unaligned loads at head/tail. */ +__m128i v = *(__m128i_u *)(buf); +__m128i w = *(__m128i_u *)(buf + len - 16); +/* Align head/tail to 16-byte boundaries. */ +const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); +const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); +__m128i zero = { 0 }; -/* Loop over 16-byte aligned blocks of 64. */ -while (likely(p <= e)) { -t = _mm_cmpeq_epi8(t, zero); -if (unlikely(_mm_movemask_epi8(t) != 0x)) { +/* Collect a partial block at tail end. */ +v |= e[-1]; w |= e[-2]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-3]; w |= e[-4]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-5]; w |= e[-6]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-7]; v |= w; + +/* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ +do { +v = _mm_cmpeq_epi8(v, zero); +if (unlikely(_mm_movemask_epi8(v) != 0x)) { return false; } -t = p[-4] | p[-3] | p[-2] | p[-1]; -p += 4; -} +v = p[0]; w = p[1]; +SSE_REASSOC_BARRIER(v, w); +v |= p[2]; w |= p[3]; +SSE_REASSOC_BARRIER(v, w); +v |= p[4]; w |= p[5]; +SSE_REASSOC_BARRIER(v, w); +v |= p[6]; w |= p[7]; +SSE_REASSOC_BARRIER(v, w); +v |= w; +p += 8; +} while (p < e - 7); -/* Finish the aligned tail. */ -t |= e[-3]; -t |= e[-2]; -t |= e[-1]; - -/* Finish the unaligned tail. */ -t |= _mm_loadu_si128(buf + len - 16); - -return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0x; +return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0x; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { -/* Begin with an unaligned head of 32 bytes. */ -__m256i t = _mm256_loadu_si256(buf); -__m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); -__m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); +/* Unaligned loads at head/tail. */ +__m256i v = *(__m256i_u *)(buf); +__m256i w = *(__m256i_u *)(buf + len - 32); +/* Align head/tail to 32-byte boundaries. */ +const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); +const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); +__m256i zero = { 0 }; -/* Loop over 32-byte aligned blocks of 128. */ -while (p <= e) { -if (unlikely(!_mm256_testz_si256(t, t))) { +/* Collect a partial block at tail end. */ +v |= e[-1]; w |= e[-2]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-3]; w |= e[-4]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-5]; w |= e[-6]; +SSE_REASSOC_BARRIER(v, w); +v |= e[-7]; v |= w; + +/* Loop over complete 256-byte blocks. */ +for (; p < e - 7; p += 8) { +/* PTEST is not profitable here. */ +v = _mm256_cmpeq_epi8(v, zero); +if (unlikely(_mm256_movemask_epi8(v) != 0x)) { return false; } -t = p[-4] | p[-3] | p[-2] | p[-1]; -p += 4; -} ; +v = p[0]; w = p[1]; +SSE_REASSOC_BARRIER(v, w); +v |= p[2]; w |= p[3]; +SSE_REASSOC_BARRIER(v, w); +v |= p[4]; w |= p[5]; +SSE_REASSOC_BARRIER(v, w); +
[PULL 22/35] target/hppa: Use insn_start from DisasContextBase
To keep the multiple update check, replace insn_start with insn_start_updated. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/hppa/translate.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target/hppa/translate.c b/target/hppa/translate.c index 8a1a8bc3aa..42fa480950 100644 --- a/target/hppa/translate.c +++ b/target/hppa/translate.c @@ -44,7 +44,6 @@ typedef struct DisasCond { typedef struct DisasContext { DisasContextBase base; CPUState *cs; -TCGOp *insn_start; uint64_t iaoq_f; uint64_t iaoq_b; @@ -62,6 +61,7 @@ typedef struct DisasContext { int privilege; bool psw_n_nonzero; bool is_pa20; +bool insn_start_updated; #ifdef CONFIG_USER_ONLY MemOp unalign; @@ -300,9 +300,9 @@ void hppa_translate_init(void) static void set_insn_breg(DisasContext *ctx, int breg) { -assert(ctx->insn_start != NULL); -tcg_set_insn_start_param(ctx->insn_start, 2, breg); -ctx->insn_start = NULL; +assert(!ctx->insn_start_updated); +ctx->insn_start_updated = true; +tcg_set_insn_start_param(ctx->base.insn_start, 2, breg); } static DisasCond cond_make_f(void) @@ -4694,7 +4694,7 @@ static void hppa_tr_insn_start(DisasContextBase *dcbase, CPUState *cs) DisasContext *ctx = container_of(dcbase, DisasContext, base); tcg_gen_insn_start(ctx->iaoq_f, ctx->iaoq_b, 0); -ctx->insn_start = tcg_last_op(); +ctx->insn_start_updated = false; } static void hppa_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs) -- 2.34.1
[PULL 35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel
Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 56 ++--- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index eb8030a3f0..ff003dc40e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -179,51 +179,39 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ - - -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ -/* Array is sorted in order of algorithm preference. */ -static const struct { -unsigned bit; -biz_accel_fn fn; -} all[] = { +static biz_accel_fn const accel_table[] = { +buffer_is_zero_int_ge256, +buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT -{ CPUINFO_AVX2,buffer_zero_avx2 }, +buffer_zero_avx2, #endif -{ CPUINFO_SSE2,buffer_zero_sse2 }, -{ CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, -}; - -for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { -if (info & all[i].bit) { -buffer_is_zero_accel = all[i].fn; -return all[i].bit; -} -} -return 0; -} - -static unsigned used_accel; +}; +static unsigned accel_index; static void __attribute__((constructor)) init_accel(void) { -used_accel = select_accel_cpuinfo(cpuinfo_init()); +unsigned info = cpuinfo_init(); +unsigned index = (info & CPUINFO_SSE2 ? 1 : 0); + +#ifdef CONFIG_AVX2_OPT +if (info & CPUINFO_AVX2) { +index = 2; +} +#endif + +accel_index = index; +buffer_is_zero_accel = accel_table[index]; } #define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { -/* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ -unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); -used_accel |= used; -return used; +if (accel_index != 0) { +buffer_is_zero_accel = accel_table[--accel_index]; +return true; +} +return false; } #else bool test_buffer_is_zero_next_accel(void) -- 2.34.1
[PULL 33/35] util/bufferiszero: Improve scalar variant
Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 85 +++-- 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 02df82b4ff..c9a7ded016 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,57 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { -if (unlikely(len < 8)) { -/* For a very small buffer, simply accumulate all the bytes. */ -const unsigned char *p = buf; -const unsigned char *e = buf + len; -unsigned char t = 0; +uint64_t t; +const uint64_t *p, *e; -do { -t |= *p++; -} while (p < e); - -return t == 0; -} else { -/* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ -uint64_t t = ldq_he_p(buf); -const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); -const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - -for (; p + 8 <= e; p += 8) { -if (t) { -return false; -} -t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; -} -while (p < e) { -t |= *p++; -} -t |= ldq_he_p(buf + len - 8); - -return t == 0; +/* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ +if (unlikely(len <= 8)) { +return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + +t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); +p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); +e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + +/* Read 0 to 31 aligned words from the middle. */ +while (p < e) { +t |= *p++; +} +return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ +/* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ +uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); +const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); +const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + +/* Collect a partial block at the tail end. */ +t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + +/* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ +do { +if (t) { +return false; +} +t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; +p += 8; +} while (p < e - 7); + +return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2,buffer_zero_avx2 }, #endif { CPUINFO_SSE2,buffer_zero_sse2 }, -{ CPUINFO_ALWAYS, buffer_is_zero_integer }, +{ CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } -return buffer_is_zero_integer(buf, len); +return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) -- 2.34.1
[PULL 25/35] target/riscv: Use insn_start from DisasContextBase
To keep the multiple update check, replace insn_start with insn_start_updated. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/riscv/translate.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/target/riscv/translate.c b/target/riscv/translate.c index 9d57089fcc..9ff09ebdb6 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -115,8 +115,7 @@ typedef struct DisasContext { bool itrigger; /* FRM is known to contain a valid value. */ bool frm_valid; -/* TCG of the current insn_start */ -TCGOp *insn_start; +bool insn_start_updated; } DisasContext; static inline bool has_ext(DisasContext *ctx, uint32_t ext) @@ -207,9 +206,9 @@ static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in) static void decode_save_opc(DisasContext *ctx) { -assert(ctx->insn_start != NULL); -tcg_set_insn_start_param(ctx->insn_start, 1, ctx->opcode); -ctx->insn_start = NULL; +assert(!ctx->insn_start_updated); +ctx->insn_start_updated = true; +tcg_set_insn_start_param(ctx->base.insn_start, 1, ctx->opcode); } static void gen_pc_plus_diff(TCGv target, DisasContext *ctx, @@ -1224,7 +1223,7 @@ static void riscv_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu) } tcg_gen_insn_start(pc_next, 0); -ctx->insn_start = tcg_last_op(); +ctx->insn_start_updated = false; } static void riscv_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) -- 2.34.1
[PULL 24/35] target/microblaze: Use insn_start from DisasContextBase
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/microblaze/translate.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c index 4e52ef32db..fc451befae 100644 --- a/target/microblaze/translate.c +++ b/target/microblaze/translate.c @@ -62,9 +62,6 @@ typedef struct DisasContext { DisasContextBase base; const MicroBlazeCPUConfig *cfg; -/* TCG op of the current insn_start. */ -TCGOp *insn_start; - TCGv_i32 r0; bool r0_set; @@ -699,14 +696,14 @@ static TCGv compute_ldst_addr_ea(DisasContext *dc, int ra, int rb) static void record_unaligned_ess(DisasContext *dc, int rd, MemOp size, bool store) { -uint32_t iflags = tcg_get_insn_start_param(dc->insn_start, 1); +uint32_t iflags = tcg_get_insn_start_param(dc->base.insn_start, 1); iflags |= ESR_ESS_FLAG; iflags |= rd << 5; iflags |= store * ESR_S; iflags |= (size == MO_32) * ESR_W; -tcg_set_insn_start_param(dc->insn_start, 1, iflags); +tcg_set_insn_start_param(dc->base.insn_start, 1, iflags); } #endif @@ -1624,7 +1621,6 @@ static void mb_tr_insn_start(DisasContextBase *dcb, CPUState *cs) DisasContext *dc = container_of(dcb, DisasContext, base); tcg_gen_insn_start(dc->base.pc_next, dc->tb_flags & ~MSR_TB_MASK); -dc->insn_start = tcg_last_op(); } static void mb_tr_translate_insn(DisasContextBase *dcb, CPUState *cs) -- 2.34.1
[PULL 06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO
From: Michael Tokarev There's identical code for SO_SNDTIMEO and SO_RCVTIMEO, currently implemented using an ugly goto into another switch case. Eliminate that using arithmetic if, making code flow more natural. Signed-off-by: Michael Tokarev Message-Id: <20240331100737.2724186-5-...@tls.msk.ru> Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 1fedf16650..41659b63f5 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2301,12 +2301,10 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, case TARGET_SOL_SOCKET: switch (optname) { case TARGET_SO_RCVTIMEO: +case TARGET_SO_SNDTIMEO: { struct timeval tv; -optname = SO_RCVTIMEO; - -set_timeout: if (optlen != sizeof(struct target_timeval)) { return -TARGET_EINVAL; } @@ -2315,13 +2313,12 @@ set_timeout: return -TARGET_EFAULT; } -ret = get_errno(setsockopt(sockfd, SOL_SOCKET, optname, +ret = get_errno(setsockopt(sockfd, SOL_SOCKET, +optname == TARGET_SO_RCVTIMEO ? +SO_RCVTIMEO : SO_SNDTIMEO, , sizeof(tv))); return ret; } -case TARGET_SO_SNDTIMEO: -optname = SO_SNDTIMEO; -goto set_timeout; case TARGET_SO_ATTACH_FILTER: { struct target_sock_fprog *tfprog; -- 2.34.1
[PULL 19/35] tcg: Add TCGContext.emit_before_op
Allow operations to be emitted via normal expanders into the middle of the opcode stream. Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- include/tcg/tcg.h | 6 ++ tcg/tcg.c | 14 -- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h index 451f3fec41..05a1912f8a 100644 --- a/include/tcg/tcg.h +++ b/include/tcg/tcg.h @@ -553,6 +553,12 @@ struct TCGContext { QTAILQ_HEAD(, TCGOp) ops, free_ops; QSIMPLEQ_HEAD(, TCGLabel) labels; +/* + * When clear, new ops are added to the tail of @ops. + * When set, new ops are added in front of @emit_before_op. + */ +TCGOp *emit_before_op; + /* Tells which temporary holds a given register. It does not take into account fixed registers */ TCGTemp *reg_to_temp[TCG_TARGET_NB_REGS]; diff --git a/tcg/tcg.c b/tcg/tcg.c index d6670237fb..0c0bb9d169 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1521,6 +1521,7 @@ void tcg_func_start(TCGContext *s) QTAILQ_INIT(>ops); QTAILQ_INIT(>free_ops); +s->emit_before_op = NULL; QSIMPLEQ_INIT(>labels); tcg_debug_assert(s->addr_type == TCG_TYPE_I32 || @@ -2332,7 +2333,11 @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args) op->args[pi++] = (uintptr_t)info; tcg_debug_assert(pi == total_args); -QTAILQ_INSERT_TAIL(_ctx->ops, op, link); +if (tcg_ctx->emit_before_op) { +QTAILQ_INSERT_BEFORE(tcg_ctx->emit_before_op, op, link); +} else { +QTAILQ_INSERT_TAIL(_ctx->ops, op, link); +} tcg_debug_assert(n_extend < ARRAY_SIZE(extend_free)); for (i = 0; i < n_extend; ++i) { @@ -3215,7 +3220,12 @@ static TCGOp *tcg_op_alloc(TCGOpcode opc, unsigned nargs) TCGOp *tcg_emit_op(TCGOpcode opc, unsigned nargs) { TCGOp *op = tcg_op_alloc(opc, nargs); -QTAILQ_INSERT_TAIL(_ctx->ops, op, link); + +if (tcg_ctx->emit_before_op) { +QTAILQ_INSERT_BEFORE(tcg_ctx->emit_before_op, op, link); +} else { +QTAILQ_INSERT_TAIL(_ctx->ops, op, link); +} return op; } -- 2.34.1
[PULL 05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used
From: Michael Tokarev Signed-off-by: Michael Tokarev Message-Id: <20240331100737.2724186-4-...@tls.msk.ru> Signed-off-by: Richard Henderson --- linux-user/syscall.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index cca9cafe4f..1fedf16650 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2049,7 +2049,6 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, { abi_long ret; int val; -struct ip_mreq_source *ip_mreq_source; switch(level) { case SOL_TCP: @@ -2123,6 +2122,9 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, case IP_UNBLOCK_SOURCE: case IP_ADD_SOURCE_MEMBERSHIP: case IP_DROP_SOURCE_MEMBERSHIP: +{ +struct ip_mreq_source *ip_mreq_source; + if (optlen != sizeof (struct target_ip_mreq_source)) return -TARGET_EINVAL; @@ -2133,7 +2135,7 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, ret = get_errno(setsockopt(sockfd, level, optname, ip_mreq_source, optlen)); unlock_user (ip_mreq_source, optval_addr, 0); break; - +} default: goto unimplemented; } -- 2.34.1
[PULL 03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY
From: Michael Tokarev This setsockopt accepts zero-lengh optlen (current qemu implementation does not allow this). Also, there's no need to make a copy of the key, it is enough to use lock_user() (which accepts zero length already). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2197 Fixes: f312fc "linux-user: Add support for setsockopt() option SOL_ALG" Signed-off-by: Michael Tokarev Message-Id: <20240331100737.2724186-2-...@tls.msk.ru> Signed-off-by: Richard Henderson --- linux-user/syscall.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 3df2b94d9a..59fb3e911f 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2277,18 +2277,13 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, switch (optname) { case ALG_SET_KEY: { -char *alg_key = g_malloc(optlen); - +char *alg_key = lock_user(VERIFY_READ, optval_addr, optlen, 1); if (!alg_key) { -return -TARGET_ENOMEM; -} -if (copy_from_user(alg_key, optval_addr, optlen)) { -g_free(alg_key); return -TARGET_EFAULT; } ret = get_errno(setsockopt(sockfd, level, optname, alg_key, optlen)); -g_free(alg_key); +unlock_user(alg_key, optval_addr, optlen); break; } case ALG_SET_AEAD_AUTHSIZE: -- 2.34.1
[PULL 10/35] target/sh4: mac.w: memory accesses are 16-bit words
From: Zack Buhman Before this change, executing a code sequence such as: mova tblm,r0 movr0,r1 mova tbln,r0 clrs clrmac mac.w @r0+,@r1+ mac.w @r0+,@r1+ .align 4 tblm:.word 0x1234 .word 0x5678 tbln:.word 0x9abc .word 0xdefg Does not result in correct behavior: Expected behavior: first macw : macl = 0x1234 * 0x9abc + 0x0 mach = 0x0 second macw: macl = 0x5678 * 0xdefg + 0xb00a630 mach = 0x0 Observed behavior (qemu-sh4eb, prior to this commit): first macw : macl = 0x5678 * 0xdefg + 0x0 mach = 0x0 second macw: (unaligned longword memory access, SIGBUS) Various SH-4 ISA manuals also confirm that `mac.w` is a 16-bit word memory access, not a 32-bit longword memory access. Signed-off-by: Zack Buhman Reviewed-by: Yoshinori Sato Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20240402093756.27466-1-z...@buhman.org> Signed-off-by: Richard Henderson --- target/sh4/translate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/sh4/translate.c b/target/sh4/translate.c index a9b1bc7524..6643c14dde 100644 --- a/target/sh4/translate.c +++ b/target/sh4/translate.c @@ -816,10 +816,10 @@ static void _decode_opc(DisasContext * ctx) TCGv arg0, arg1; arg0 = tcg_temp_new(); tcg_gen_qemu_ld_i32(arg0, REG(B7_4), ctx->memidx, -MO_TESL | MO_ALIGN); +MO_TESW | MO_ALIGN); arg1 = tcg_temp_new(); tcg_gen_qemu_ld_i32(arg1, REG(B11_8), ctx->memidx, -MO_TESL | MO_ALIGN); +MO_TESW | MO_ALIGN); gen_helper_macw(tcg_env, arg0, arg1); tcg_gen_addi_i32(REG(B11_8), REG(B11_8), 2); tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 2); -- 2.34.1
[PULL 13/35] target/sh4: Fix mac.w with saturation enabled
From: Zack Buhman The saturation arithmetic logic in helper_macw is not correct. I tested and verified this behavior on a SH7091. Reviewd-by: Yoshinori Sato Signed-off-by: Zack Buhman Message-Id: <20240405233802.29128-3-z...@buhman.org> [rth: Reformat helper_macw, add a test case.] Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- target/sh4/helper.h | 2 +- target/sh4/op_helper.c| 28 +--- tests/tcg/sh4/test-macw.c | 61 +++ tests/tcg/sh4/Makefile.target | 3 ++ 4 files changed, 82 insertions(+), 12 deletions(-) create mode 100644 tests/tcg/sh4/test-macw.c diff --git a/target/sh4/helper.h b/target/sh4/helper.h index 64056e4a39..29011d3dbb 100644 --- a/target/sh4/helper.h +++ b/target/sh4/helper.h @@ -12,7 +12,7 @@ DEF_HELPER_1(discard_movcal_backup, void, env) DEF_HELPER_2(ocbi, void, env, i32) DEF_HELPER_3(macl, void, env, s32, s32) -DEF_HELPER_3(macw, void, env, i32, i32) +DEF_HELPER_3(macw, void, env, s32, s32) DEF_HELPER_2(ld_fpscr, void, env, i32) diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c index d0bae0cc00..99394b714c 100644 --- a/target/sh4/op_helper.c +++ b/target/sh4/op_helper.c @@ -177,22 +177,28 @@ void helper_macl(CPUSH4State *env, int32_t arg0, int32_t arg1) env->mac = res; } -void helper_macw(CPUSH4State *env, uint32_t arg0, uint32_t arg1) +void helper_macw(CPUSH4State *env, int32_t arg0, int32_t arg1) { -int64_t res; +/* Inputs are already sign-extended from 16 bits. */ +int32_t mul = arg0 * arg1; -res = ((uint64_t) env->mach << 32) | env->macl; -res += (int64_t) (int16_t) arg0 *(int64_t) (int16_t) arg1; -env->mach = (res >> 32) & 0x; -env->macl = res & 0x; if (env->sr & (1u << SR_S)) { -if (res < -0x8000) { +/* + * In saturation arithmetic mode, the accumulator is 32-bit + * with carry. MACH is not considered during the addition + * operation nor the 32-bit saturation logic. + */ +int32_t res, macl = env->macl; + +if (sadd32_overflow(macl, mul, )) { +res = macl < 0 ? INT32_MIN : INT32_MAX; +/* If overflow occurs, the MACH register is set to 1. */ env->mach = 1; -env->macl = 0x8000; -} else if (res > 0x7fff) { -env->mach = 1; -env->macl = 0x7fff; } +env->macl = res; +} else { +/* In non-saturation arithmetic mode, the accumulator is 64-bit */ +env->mac += mul; } } diff --git a/tests/tcg/sh4/test-macw.c b/tests/tcg/sh4/test-macw.c new file mode 100644 index 00..4eceec8634 --- /dev/null +++ b/tests/tcg/sh4/test-macw.c @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#include +#include +#include + +int64_t mac_w(int64_t mac, const int16_t *a, const int16_t *b) +{ +register uint32_t macl __asm__("macl") = mac; +register uint32_t mach __asm__("mach") = mac >> 32; + +asm volatile("mac.w @%0+,@%1+" + : "+r"(a), "+r"(b), "+x"(macl), "+x"(mach)); + +return ((uint64_t)mach << 32) | macl; +} + +typedef struct { +int64_t mac; +int16_t a, b; +int64_t res[2]; +} Test; + +__attribute__((noinline)) +void test(const Test *t, int sat) +{ +int64_t res; + +if (sat) { +asm volatile("sets"); +} else { +asm volatile("clrs"); +} +res = mac_w(t->mac, >a, >b); + +if (res != t->res[sat]) { +fprintf(stderr, "%#llx + (%#x * %#x) = %#llx -- got %#llx\n", +t->mac, t->a, t->b, t->res[sat], res); +abort(); +} +} + +int main() +{ +static const Test tests[] = { +{ 0, 2, 3, { 6, 6 } }, +{ 0x123456787ffell, 2, -3, + { 0x123456787ff8ll, 0x123456787ff8ll } }, +{ 0xabcdef127ffall, 2, 3, + { 0xabcdef128000ll, 0x00017fffll } }, +{ 0xfll, INT16_MAX, INT16_MAX, + { 0x103fffll, 0xf3fffll } }, +}; + +for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i) { +for (int j = 0; j < 2; ++j) { +test([i], j); +} +} +return 0; +} diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target index 9a11c10924..4d09291c0c 100644 --- a/tests/tcg/sh4/Makefile.target +++ b/tests/tcg/sh4/Makefile.target @@ -14,3 +14,6 @@ VPATH += $(SRC_PATH)/tests/tcg/sh4 test-macl: CFLAGS += -O -g TESTS += test-macl + +test-macw: CFLAGS += -O -g +TESTS += test-macw -- 2.34.1
[PULL 00/35] misc patch queue
This started out to be tcg and linux-user only, but then added a few target bug fixes, and the trolled back through my inbox and picked up some other safe patch sets that got lost. r~ The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5: Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-misc-20240408 for you to fetch changes up to 50dbeda88ab71f9d426b7f4b126c79c44860e475: util/bufferiszero: Simplify test_buffer_is_zero_next_accel (2024-04-08 06:27:58 -1000) util/bufferiszero: Optimizations and cleanups, esp code removal target/m68k: Semihosting for non-coldfire cpus target/m68k: Fix fp accrued exception reporting target/hppa: Fix IIAOQ, IIASQ for pa2.0 target/sh4: Fixes to mac.l and mac.w saturation target/sh4: Fixes to illegal delay slot reporting linux-user: Cleanups for do_setsockopt linux-user: Add FITRIM ioctl linux-user: Fix waitid return of siginfo_t and rusage tcg/optimize: Do not attempt to constant fold neg_vec accel/tcg: Improve can_do_io management, mmio bug fix Alexander Monakov (5): util/bufferiszero: Remove SSE4.1 variant util/bufferiszero: Remove AVX512 variant util/bufferiszero: Reorganize for early test for acceleration util/bufferiszero: Remove useless prefetches util/bufferiszero: Optimize SSE2 and AVX2 variants Keith Packard (3): target/m68k: Map FPU exceptions to FPSR register target/m68k: Pass semihosting arg to exit target/m68k: Support semihosting on non-ColdFire targets Michael Tokarev (4): linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq() linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO Michael Vogt (1): linux-user: Add FITRIM ioctl Nguyen Dinh Phi (1): linux-user: replace calloc() with g_new0() Richard Henderson (17): tcg/optimize: Do not attempt to constant fold neg_vec linux-user: Fix waitid return of siginfo_t and rusage target/hppa: Fix IIAOQ, IIASQ for pa2.0 target/sh4: Merge mach and macl into a union target/m68k: Perform the semihosting test during translate tcg: Add TCGContext.emit_before_op accel/tcg: Add insn_start to DisasContextBase target/arm: Use insn_start from DisasContextBase target/hppa: Use insn_start from DisasContextBase target/i386: Preserve DisasContextBase.insn_start across rewind target/microblaze: Use insn_start from DisasContextBase target/riscv: Use insn_start from DisasContextBase target/s390x: Use insn_start from DisasContextBase accel/tcg: Improve can_do_io management util/bufferiszero: Improve scalar variant util/bufferiszero: Introduce biz_accel_fn typedef util/bufferiszero: Simplify test_buffer_is_zero_next_accel Zack Buhman (4): target/sh4: mac.w: memory accesses are 16-bit words target/sh4: Fix mac.l with saturation enabled target/sh4: Fix mac.w with saturation enabled target/sh4: add missing CHECK_NOT_DELAY_SLOT include/exec/translator.h | 4 +- include/qemu/cutils.h | 32 +++- include/tcg/tcg.h | 6 + linux-user/ioctls.h | 3 + linux-user/syscall_defs.h | 1 + linux-user/syscall_types.h| 5 + target/arm/tcg/translate.h| 12 +- target/m68k/cpu.h | 5 +- target/m68k/helper.h | 2 + target/sh4/cpu.h | 14 +- target/sh4/helper.h | 4 +- accel/tcg/translator.c| 47 ++--- linux-user/main.c | 6 +- linux-user/syscall.c | 95 +- target/arm/tcg/translate-a64.c| 2 +- target/arm/tcg/translate.c| 2 +- target/hppa/int_helper.c | 20 +- target/hppa/sys_helper.c | 18 +- target/hppa/translate.c | 10 +- target/i386/tcg/translate.c | 3 + target/m68k/cpu.c | 12 +- target/m68k/fpu_helper.c | 72 target/m68k/helper.c | 4 +- target/m68k/m68k-semi.c | 4 +- target/m68k/op_helper.c | 14 +- target/m68k/translate.c | 54 +- target/microblaze/translate.c | 8 +- target/riscv/translate.c | 11 +- target/s390x/tcg/translate.c | 4 +- target/sh4/op_helper.c| 51 ++--- target/sh4/translate.c| 7 +- tcg/optimize.c| 17 +- tcg/tcg.c | 14 +- tests/tcg/aarch64/test-2150.c | 12 ++ tests/tcg
[PULL 12/35] target/sh4: Fix mac.l with saturation enabled
From: Zack Buhman The saturation arithmetic logic in helper_macl is not correct. I tested and verified this behavior on a SH7091. Signed-off-by: Zack Buhman Message-Id: <20240404162641.27528-2-z...@buhman.org> [rth: Reformat helper_macl, add a test case.] Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- target/sh4/helper.h | 2 +- target/sh4/op_helper.c| 23 ++-- tests/tcg/sh4/test-macl.c | 67 +++ tests/tcg/sh4/Makefile.target | 5 +++ 4 files changed, 86 insertions(+), 11 deletions(-) create mode 100644 tests/tcg/sh4/test-macl.c diff --git a/target/sh4/helper.h b/target/sh4/helper.h index 8d792f6b55..64056e4a39 100644 --- a/target/sh4/helper.h +++ b/target/sh4/helper.h @@ -11,7 +11,7 @@ DEF_HELPER_3(movcal, void, env, i32, i32) DEF_HELPER_1(discard_movcal_backup, void, env) DEF_HELPER_2(ocbi, void, env, i32) -DEF_HELPER_3(macl, void, env, i32, i32) +DEF_HELPER_3(macl, void, env, s32, s32) DEF_HELPER_3(macw, void, env, i32, i32) DEF_HELPER_2(ld_fpscr, void, env, i32) diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c index 4559d0d376..d0bae0cc00 100644 --- a/target/sh4/op_helper.c +++ b/target/sh4/op_helper.c @@ -158,20 +158,23 @@ void helper_ocbi(CPUSH4State *env, uint32_t address) } } -void helper_macl(CPUSH4State *env, uint32_t arg0, uint32_t arg1) +void helper_macl(CPUSH4State *env, int32_t arg0, int32_t arg1) { +const int64_t min = -(1ll << 47); +const int64_t max = (1ll << 47) - 1; +int64_t mul = (int64_t)arg0 * arg1; +int64_t mac = env->mac; int64_t res; -res = ((uint64_t) env->mach << 32) | env->macl; -res += (int64_t) (int32_t) arg0 *(int64_t) (int32_t) arg1; -env->mach = (res >> 32) & 0x; -env->macl = res & 0x; -if (env->sr & (1u << SR_S)) { -if (res < 0) -env->mach |= 0x; -else -env->mach &= 0x7fff; +if (!(env->sr & (1u << SR_S))) { +res = mac + mul; +} else if (sadd64_overflow(mac, mul, )) { +res = mac < 0 ? min : max; +} else { +res = MIN(MAX(res, min), max); } + +env->mac = res; } void helper_macw(CPUSH4State *env, uint32_t arg0, uint32_t arg1) diff --git a/tests/tcg/sh4/test-macl.c b/tests/tcg/sh4/test-macl.c new file mode 100644 index 00..b66c854365 --- /dev/null +++ b/tests/tcg/sh4/test-macl.c @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#include +#include +#include + +#define MACL_S_MIN (-(1ll << 47)) +#define MACL_S_MAX ((1ll << 47) - 1) + +int64_t mac_l(int64_t mac, const int32_t *a, const int32_t *b) +{ +register uint32_t macl __asm__("macl") = mac; +register uint32_t mach __asm__("mach") = mac >> 32; + +asm volatile("mac.l @%0+,@%1+" + : "+r"(a), "+r"(b), "+x"(macl), "+x"(mach)); + +return ((uint64_t)mach << 32) | macl; +} + +typedef struct { +int64_t mac; +int32_t a, b; +int64_t res[2]; +} Test; + +__attribute__((noinline)) +void test(const Test *t, int sat) +{ +int64_t res; + +if (sat) { +asm volatile("sets"); +} else { +asm volatile("clrs"); +} +res = mac_l(t->mac, >a, >b); + +if (res != t->res[sat]) { +fprintf(stderr, "%#llx + (%#x * %#x) = %#llx -- got %#llx\n", +t->mac, t->a, t->b, t->res[sat], res); +abort(); +} +} + +int main() +{ +static const Test tests[] = { +{ 0x7fff12345678ll, INT32_MAX, INT32_MAX, + { 0x40007ffe12345679ll, MACL_S_MAX } }, +{ MACL_S_MIN, -1, 1, + { 0x7fffll, MACL_S_MIN } }, +{ INT64_MIN, -1, 1, + { INT64_MAX, MACL_S_MIN } }, +{ 0x7fffll, INT32_MAX, INT32_MAX, + { 0x40007ffe0001ll, MACL_S_MAX } }, +{ 4, 1, 2, { 6, 6 } }, +{ -4, -1, -2, { -2, -2 } }, +}; + +for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i) { +for (int j = 0; j < 2; ++j) { +test([i], j); +} +} +return 0; +} diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target index 16eaa850a8..9a11c10924 100644 --- a/tests/tcg/sh4/Makefile.target +++ b/tests/tcg/sh4/Makefile.target @@ -9,3 +9,8 @@ run-signals: signals $(call skip-test, $<, "BROKEN") run-plugin-signals-with-%: $(call skip-test, $<, "BROKEN") + +VPATH += $(SRC_PATH)/tests/tcg/sh4 + +test-macl: CFLAGS += -O -g +TESTS += test-macl -- 2.34.1
[PULL 16/35] target/m68k: Pass semihosting arg to exit
From: Keith Packard Instead of using d0 (the semihost function number), use d1 (the provide exit status). Signed-off-by: Keith Packard Reviewed-by: Peter Maydell Message-Id: <20230802161914.395443-2-kei...@keithp.com> Signed-off-by: Richard Henderson --- target/m68k/m68k-semi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/m68k/m68k-semi.c b/target/m68k/m68k-semi.c index 546cff2246..6fbbd140f3 100644 --- a/target/m68k/m68k-semi.c +++ b/target/m68k/m68k-semi.c @@ -132,8 +132,8 @@ void do_m68k_semihosting(CPUM68KState *env, int nr) args = env->dregs[1]; switch (nr) { case HOSTED_EXIT: -gdb_exit(env->dregs[0]); -exit(env->dregs[0]); +gdb_exit(env->dregs[1]); +exit(env->dregs[1]); case HOSTED_OPEN: GET_ARG(0); -- 2.34.1
[PULL 30/35] util/bufferiszero: Reorganize for early test for acceleration
From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amona...@ispras.ru> [rth: Use __builtin_constant_p; move the indirect call out of line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 32 - util/bufferiszero.c | 84 +-- 2 files changed, 63 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..741dade7cf 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ +/* + * For any reasonably sized buffer, these three samples come from + * three different cachelines. In qemu-img usage, we find that + * each byte eliminates more than half of all buffer testing. + * It is therefore critical to performance that the byte tests + * short-circuit, so that we do not pull in additional cache lines. + * Do not "optimize" this to !(a | b | c). + */ +return !buf[0] && !buf[len - 1] && !buf[len / 2]; +} + +#ifdef __OPTIMIZE__ +static inline bool buffer_is_zero(const void *buf, size_t len) +{ +return (__builtin_constant_p(len) && len >= 256 +? buffer_is_zero_sample3(buf, len) && + buffer_is_zero_ge256(buf, len) +: buffer_is_zero_ool(buf, len)); +} +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..972f394cbd 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCELbuffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCELbuffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; -unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT -{ CPUINFO_AVX2,128, buffer_zero_avx2 }, +{ CPUINFO_AVX2,buffer_zero_avx2 }, #endif -{ CPUINFO_SSE2, 64, buffer_zero_sse2 }, -{ CPUINFO_ALWAYS,0, buffer_zero_int }, +{ CPUINFO_SSE2,buffer_zero_sse2 }, +{ CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { -length_to_accel = all[i].len; -buffer_accel = all[i].fn; +buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ -if (likely(len >= length_to_accel)) { -return buffer_accel(buf, len); -} -return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool
[PULL 28/35] util/bufferiszero: Remove SSE4.1 variant
From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amona...@ispras.ru> --- util/bufferiszero.c | 29 - 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ -__m128i t = _mm_loadu_si128(buf); -__m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); -__m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - -/* Loop over 16-byte aligned blocks of 64. */ -while (likely(p <= e)) { -__builtin_prefetch(p); -if (unlikely(!_mm_testz_si128(t, t))) { -return false; -} -t = p[-4] | p[-3] | p[-2] | p[-1]; -p += 4; -} - -/* Finish the aligned tail. */ -t |= e[-3]; -t |= e[-2]; -t |= e[-1]; - -/* Finish the unaligned tail. */ -t |= _mm_loadu_si128(buf + len - 16); - -return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2,128, buffer_zero_avx2 }, -{ CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS,0, buffer_zero_int }, -- 2.34.1
[PULL 27/35] accel/tcg: Improve can_do_io management
We already attempted to set and clear can_do_io before the first and last insns, but only used the initial value of max_insns and the call to translator_io_start to find those insns. Now that we track insn_start in DisasContextBase, and now that we have emit_before_op, we can wait until we have finished translation to identify the true first and last insns and emit the sets of can_do_io at that time. This fixes the case of a translation block which crossed a page boundary, and for which the second page turned out to be mmio. In this case we truncate the block, and the previous logic for can_do_io could leave a block with a single insn with can_do_io set to false, which would fail an assertion in cpu_io_recompile. Reported-by: Jørgen Hansen Reviewed-by: Philippe Mathieu-Daudé Tested-by: Jørgen Hansen Signed-off-by: Richard Henderson --- include/exec/translator.h | 1 - accel/tcg/translator.c| 45 --- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/include/exec/translator.h b/include/exec/translator.h index ceaeca8c91..2c4fb818e7 100644 --- a/include/exec/translator.h +++ b/include/exec/translator.h @@ -87,7 +87,6 @@ typedef struct DisasContextBase { int num_insns; int max_insns; bool singlestep_enabled; -int8_t saved_can_do_io; bool plugin_enabled; struct TCGOp *insn_start; void *host_addr[2]; diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c index ae61c154c2..9de0bc34c8 100644 --- a/accel/tcg/translator.c +++ b/accel/tcg/translator.c @@ -18,20 +18,14 @@ static void set_can_do_io(DisasContextBase *db, bool val) { -if (db->saved_can_do_io != val) { -db->saved_can_do_io = val; - -QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1); -tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env, -offsetof(ArchCPU, parent_obj.neg.can_do_io) - -offsetof(ArchCPU, env)); -} +QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1); +tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env, +offsetof(ArchCPU, parent_obj.neg.can_do_io) - +offsetof(ArchCPU, env)); } bool translator_io_start(DisasContextBase *db) { -set_can_do_io(db, true); - /* * Ensure that this instruction will be the last in the TB. * The target may override this to something more forceful. @@ -84,13 +78,6 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags) - offsetof(ArchCPU, env)); } -/* - * cpu->neg.can_do_io is set automatically here at the beginning of - * each translation block. The cost is minimal, plus it would be - * very easy to forget doing it in the translator. - */ -set_can_do_io(db, db->max_insns == 1); - return icount_start_insn; } @@ -129,6 +116,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, { uint32_t cflags = tb_cflags(tb); TCGOp *icount_start_insn; +TCGOp *first_insn_start = NULL; bool plugin_enabled; /* Initialize DisasContext */ @@ -139,7 +127,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, db->num_insns = 0; db->max_insns = *max_insns; db->singlestep_enabled = cflags & CF_SINGLE_STEP; -db->saved_can_do_io = -1; db->insn_start = NULL; db->host_addr[0] = host_pc; db->host_addr[1] = NULL; @@ -159,6 +146,9 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, *max_insns = ++db->num_insns; ops->insn_start(db, cpu); db->insn_start = tcg_last_op(); +if (first_insn_start == NULL) { +first_insn_start = db->insn_start; +} tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */ if (plugin_enabled) { @@ -171,10 +161,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, * done next -- either exiting this loop or locate the start of * the next instruction. */ -if (db->num_insns == db->max_insns) { -/* Accept I/O on the last instruction. */ -set_can_do_io(db, true); -} ops->translate_insn(db, cpu); /* @@ -207,6 +193,21 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, ops->tb_stop(db, cpu); gen_tb_end(tb, cflags, icount_start_insn, db->num_insns); +/* + * Manage can_do_io for the translation block: set to false before + * the first insn and set to true before the last insn. + */ +if (db->num_insns == 1) { +tcg_debug_assert(first_insn_start == db->insn_start); +} else { +tcg_debug_assert(first_insn_start != db->insn_start); +tcg_ctx->emit_before_op = first_insn_start; +set_can_do_io(db, false); +} +tcg_ctx->emit_before_op = db->insn_start; +
[PULL 11/35] target/sh4: Merge mach and macl into a union
Allow host access to the entire 64-bit accumulator. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/sh4/cpu.h | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h index 9211da6bde..d928bcf006 100644 --- a/target/sh4/cpu.h +++ b/target/sh4/cpu.h @@ -155,12 +155,22 @@ typedef struct CPUArchState { uint32_t pc;/* program counter */ uint32_t delayed_pc;/* target of delayed branch */ uint32_t delayed_cond; /* condition of delayed branch */ -uint32_t mach; /* multiply and accumulate high */ -uint32_t macl; /* multiply and accumulate low */ uint32_t pr;/* procedure register */ uint32_t fpscr; /* floating point status/control register */ uint32_t fpul; /* floating point communication register */ +/* multiply and accumulate: high, low and combined. */ +union { +uint64_t mac; +struct { +#if HOST_BIG_ENDIAN +uint32_t mach, macl; +#else +uint32_t macl, mach; +#endif +}; +}; + /* float point status register */ float_status fp_status; -- 2.34.1
[PULL 26/35] target/s390x: Use insn_start from DisasContextBase
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/s390x/tcg/translate.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c index 57b7db1ee9..90a74ee795 100644 --- a/target/s390x/tcg/translate.c +++ b/target/s390x/tcg/translate.c @@ -141,7 +141,6 @@ struct DisasFields { struct DisasContext { DisasContextBase base; const DisasInsn *insn; -TCGOp *insn_start; DisasFields fields; uint64_t ex_value; /* @@ -6314,7 +6313,7 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s) insn = extract_insn(env, s); /* Update insn_start now that we know the ILEN. */ -tcg_set_insn_start_param(s->insn_start, 2, s->ilen); +tcg_set_insn_start_param(s->base.insn_start, 2, s->ilen); /* Not found means unimplemented/illegal opcode. */ if (insn == NULL) { @@ -6468,7 +6467,6 @@ static void s390x_tr_insn_start(DisasContextBase *dcbase, CPUState *cs) /* Delay the set of ilen until we've read the insn. */ tcg_gen_insn_start(dc->base.pc_next, dc->cc_op, 0); -dc->insn_start = tcg_last_op(); } static target_ulong get_next_pc(CPUS390XState *env, DisasContext *s, -- 2.34.1
[PULL 18/35] target/m68k: Support semihosting on non-ColdFire targets
From: Keith Packard According to the m68k semihosting spec: "The instruction used to trigger a semihosting request depends on the m68k processor variant. On ColdFire, "halt" is used; on other processors (which don't implement "halt"), "bkpt #0" may be used." Add support for non-CodeFire processors by matching BKPT #0 instructions. Signed-off-by: Keith Packard [rth: Use semihosting_test()] Signed-off-by: Richard Henderson --- target/m68k/translate.c | 5 + 1 file changed, 5 insertions(+) diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 8f61ff1238..659543020b 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -2646,6 +2646,11 @@ DISAS_INSN(bkpt) #if defined(CONFIG_USER_ONLY) gen_exception(s, s->base.pc_next, EXCP_DEBUG); #else +/* BKPT #0 is the alternate semihosting instruction. */ +if ((insn & 7) == 0 && semihosting_test(s)) { +gen_exception(s, s->pc, EXCP_SEMIHOSTING); +return; +} gen_exception(s, s->base.pc_next, EXCP_ILLEGAL); #endif } -- 2.34.1
[PULL 34/35] util/bufferiszero: Introduce biz_accel_fn typedef
Signed-off-by: Richard Henderson --- util/bufferiszero.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c9a7ded016..eb8030a3f0 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -178,13 +179,15 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ + + static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; -bool (*fn)(const void *, size_t); +biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2,buffer_zero_avx2 }, @@ -231,7 +234,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; bool buffer_is_zero_ool(const void *buf, size_t len) { -- 2.34.1
[PULL 31/35] util/bufferiszero: Remove useless prefetches
From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amona...@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 972f394cbd..00118d649e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { -__builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { -__builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0x)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { -__builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } -- 2.34.1
[PULL 23/35] target/i386: Preserve DisasContextBase.insn_start across rewind
When aborting translation of the current insn, restore the previous value of insn_start. Acked-by: Paolo Bonzini Signed-off-by: Richard Henderson --- target/i386/tcg/translate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 07f642dc9e..76a42c679c 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -139,6 +139,7 @@ typedef struct DisasContext { TCGv_i64 tmp1_i64; sigjmp_buf jmpbuf; +TCGOp *prev_insn_start; TCGOp *prev_insn_end; } DisasContext; @@ -3123,6 +3124,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu) /* END TODO */ s->base.num_insns--; tcg_remove_ops_after(s->prev_insn_end); +s->base.insn_start = s->prev_insn_start; s->base.is_jmp = DISAS_TOO_MANY; return false; default: @@ -6995,6 +6997,7 @@ static void i386_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu) DisasContext *dc = container_of(dcbase, DisasContext, base); target_ulong pc_arg = dc->base.pc_next; +dc->prev_insn_start = dc->base.insn_start; dc->prev_insn_end = tcg_last_op(); if (tb_cflags(dcbase->tb) & CF_PCREL) { pc_arg &= ~TARGET_PAGE_MASK; -- 2.34.1
[PULL 04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq()
From: Michael Tokarev ip_mreq is declared at the beginning of do_setsockopt(), while it is used in only one place. Move its declaration to that very place and replace pointer to alloca()-allocated memory with the structure itself. target_to_host_ip_mreq() is used only once, inline it. This change also properly handles TARGET_EFAULT when the address is wrong. Signed-off-by: Michael Tokarev Message-Id: <20240331100737.2724186-3-...@tls.msk.ru> [rth: Fix braces, adjust optlen to match host structure size] Signed-off-by: Richard Henderson --- linux-user/syscall.c | 47 ++-- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 59fb3e911f..cca9cafe4f 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1615,24 +1615,6 @@ static abi_long do_pipe(CPUArchState *cpu_env, abi_ulong pipedes, return get_errno(ret); } -static inline abi_long target_to_host_ip_mreq(struct ip_mreqn *mreqn, - abi_ulong target_addr, - socklen_t len) -{ -struct target_ip_mreqn *target_smreqn; - -target_smreqn = lock_user(VERIFY_READ, target_addr, len, 1); -if (!target_smreqn) -return -TARGET_EFAULT; -mreqn->imr_multiaddr.s_addr = target_smreqn->imr_multiaddr.s_addr; -mreqn->imr_address.s_addr = target_smreqn->imr_address.s_addr; -if (len == sizeof(struct target_ip_mreqn)) -mreqn->imr_ifindex = tswapal(target_smreqn->imr_ifindex); -unlock_user(target_smreqn, target_addr, 0); - -return 0; -} - static inline abi_long target_to_host_sockaddr(int fd, struct sockaddr *addr, abi_ulong target_addr, socklen_t len) @@ -2067,7 +2049,6 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, { abi_long ret; int val; -struct ip_mreqn *ip_mreq; struct ip_mreq_source *ip_mreq_source; switch(level) { @@ -2111,15 +2092,33 @@ static abi_long do_setsockopt(int sockfd, int level, int optname, break; case IP_ADD_MEMBERSHIP: case IP_DROP_MEMBERSHIP: +{ +struct ip_mreqn ip_mreq; +struct target_ip_mreqn *target_smreqn; + +QEMU_BUILD_BUG_ON(sizeof(struct ip_mreq) != + sizeof(struct target_ip_mreq)); + if (optlen < sizeof (struct target_ip_mreq) || -optlen > sizeof (struct target_ip_mreqn)) +optlen > sizeof (struct target_ip_mreqn)) { return -TARGET_EINVAL; +} -ip_mreq = (struct ip_mreqn *) alloca(optlen); -target_to_host_ip_mreq(ip_mreq, optval_addr, optlen); -ret = get_errno(setsockopt(sockfd, level, optname, ip_mreq, optlen)); +target_smreqn = lock_user(VERIFY_READ, optval_addr, optlen, 1); +if (!target_smreqn) { +return -TARGET_EFAULT; +} +ip_mreq.imr_multiaddr.s_addr = target_smreqn->imr_multiaddr.s_addr; +ip_mreq.imr_address.s_addr = target_smreqn->imr_address.s_addr; +if (optlen == sizeof(struct target_ip_mreqn)) { +ip_mreq.imr_ifindex = tswapal(target_smreqn->imr_ifindex); +optlen = sizeof(struct ip_mreqn); +} +unlock_user(target_smreqn, optval_addr, 0); + +ret = get_errno(setsockopt(sockfd, level, optname, _mreq, optlen)); break; - +} case IP_BLOCK_SOURCE: case IP_UNBLOCK_SOURCE: case IP_ADD_SOURCE_MEMBERSHIP: -- 2.34.1
[PULL 21/35] target/arm: Use insn_start from DisasContextBase
To keep the multiple update check, replace insn_start with insn_start_updated. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/arm/tcg/translate.h | 12 ++-- target/arm/tcg/translate-a64.c | 2 +- target/arm/tcg/translate.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index 93be745cf3..dc66ff2190 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -165,10 +165,10 @@ typedef struct DisasContext { uint8_t gm_blocksize; /* True if this page is guarded. */ bool guarded_page; +/* True if the current insn_start has been updated. */ +bool insn_start_updated; /* Bottom two bits of XScale c15_cpar coprocessor access control reg */ int c15_cpar; -/* TCG op of the current insn_start. */ -TCGOp *insn_start; /* Offset from VNCR_EL2 when FEAT_NV2 redirects this reg to memory */ uint32_t nv2_redirect_offset; } DisasContext; @@ -276,10 +276,10 @@ static inline void disas_set_insn_syndrome(DisasContext *s, uint32_t syn) syn &= ARM_INSN_START_WORD2_MASK; syn >>= ARM_INSN_START_WORD2_SHIFT; -/* We check and clear insn_start_idx to catch multiple updates. */ -assert(s->insn_start != NULL); -tcg_set_insn_start_param(s->insn_start, 2, syn); -s->insn_start = NULL; +/* Check for multiple updates. */ +assert(!s->insn_start_updated); +s->insn_start_updated = true; +tcg_set_insn_start_param(s->base.insn_start, 2, syn); } static inline int curr_insn_len(DisasContext *s) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 340265beb0..2666d52711 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -14179,7 +14179,7 @@ static void aarch64_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu) pc_arg &= ~TARGET_PAGE_MASK; } tcg_gen_insn_start(pc_arg, 0, 0); -dc->insn_start = tcg_last_op(); +dc->insn_start_updated = false; } static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c index 69585e6003..dc49a8d806 100644 --- a/target/arm/tcg/translate.c +++ b/target/arm/tcg/translate.c @@ -9273,7 +9273,7 @@ static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu) condexec_bits = (dc->condexec_cond << 4) | (dc->condexec_mask >> 1); } tcg_gen_insn_start(pc_arg, condexec_bits, 0); -dc->insn_start = tcg_last_op(); +dc->insn_start_updated = false; } static bool arm_check_kernelpage(DisasContext *dc) -- 2.34.1
[PULL 08/35] linux-user: replace calloc() with g_new0()
From: Nguyen Dinh Phi Use glib allocation as recommended by the coding convention Signed-off-by: Nguyen Dinh Phi Message-Id: <20240317171747.1642207-1-phind@gmail.com> Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- linux-user/main.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 9277df2e9d..149e35432e 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -928,11 +928,7 @@ int main(int argc, char **argv, char **envp) * Prepare copy of argv vector for target. */ target_argc = argc - optind; -target_argv = calloc(target_argc + 1, sizeof (char *)); -if (target_argv == NULL) { -(void) fprintf(stderr, "Unable to allocate memory for target_argv\n"); -exit(EXIT_FAILURE); -} +target_argv = g_new0(char *, target_argc + 1); /* * If argv0 is specified (using '-0' switch) we replace -- 2.34.1
[PULL 07/35] linux-user: Add FITRIM ioctl
From: Michael Vogt Tiny patch to add the missing FITRIM ioctl. Signed-off-by: Michael Vogt Message-Id: <20240403092048.16023-2-michael.v...@gmail.com> Signed-off-by: Richard Henderson --- linux-user/ioctls.h| 3 +++ linux-user/syscall_defs.h | 1 + linux-user/syscall_types.h | 5 + 3 files changed, 9 insertions(+) diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h index 1aec9d5836..d508d0c04a 100644 --- a/linux-user/ioctls.h +++ b/linux-user/ioctls.h @@ -140,6 +140,9 @@ #ifdef FITHAW IOCTL(FITHAW, IOC_W | IOC_R, TYPE_INT) #endif +#ifdef FITRIM + IOCTL(FITRIM, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_fstrim_range))) +#endif IOCTL(FIGETBSZ, IOC_R, MK_PTR(TYPE_LONG)) #ifdef CONFIG_FIEMAP diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 744fda599e..ce0adb706e 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -945,6 +945,7 @@ struct target_rtc_pll_info { #define TARGET_FIFREEZETARGET_IOWR('X', 119, abi_int) #define TARGET_FITHAWTARGET_IOWR('X', 120, abi_int) +#define TARGET_FITRIMTARGET_IOWR('X', 121, struct fstrim_range) /* * Note that the ioctl numbers for FS_IOC_ diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h index c3b43f8022..6dd7a80ce5 100644 --- a/linux-user/syscall_types.h +++ b/linux-user/syscall_types.h @@ -341,6 +341,11 @@ STRUCT(file_clone_range, TYPE_ULONGLONG, /* src_length */ TYPE_ULONGLONG) /* dest_offset */ +STRUCT(fstrim_range, + TYPE_ULONGLONG, /* start */ + TYPE_ULONGLONG, /* len */ + TYPE_ULONGLONG) /* minlen */ + STRUCT(fiemap_extent, TYPE_ULONGLONG, /* fe_logical */ TYPE_ULONGLONG, /* fe_physical */ -- 2.34.1
[PULL 17/35] target/m68k: Perform the semihosting test during translate
Replace EXCP_HALT_INSN by EXCP_SEMIHOSTING. Perform the pre- and post-insn tests during translate, leaving only the actual semihosting operation for the exception. Signed-off-by: Richard Henderson --- target/m68k/cpu.h | 2 +- target/m68k/op_helper.c | 14 ++--- target/m68k/translate.c | 45 + 3 files changed, 44 insertions(+), 17 deletions(-) diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h index e184239a81..b5bbeedb7a 100644 --- a/target/m68k/cpu.h +++ b/target/m68k/cpu.h @@ -66,7 +66,7 @@ #define EXCP_MMU_ACCESS 58 /* MMU Access Level Violation Error */ #define EXCP_RTE0x100 -#define EXCP_HALT_INSN 0x101 +#define EXCP_SEMIHOSTING0x101 #define M68K_DTTR0 0 #define M68K_DTTR1 1 diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c index 125f6c1b08..15bad5dd46 100644 --- a/target/m68k/op_helper.c +++ b/target/m68k/op_helper.c @@ -202,18 +202,8 @@ static void cf_interrupt_all(CPUM68KState *env, int is_hw) /* Return from an exception. */ cf_rte(env); return; -case EXCP_HALT_INSN: -if (semihosting_enabled((env->sr & SR_S) == 0) -&& (env->pc & 3) == 0 -&& cpu_lduw_code(env, env->pc - 4) == 0x4e71 -&& cpu_ldl_code(env, env->pc) == 0x4e7bf000) { -env->pc += 4; -do_m68k_semihosting(env, env->dregs[0]); -return; -} -cs->halted = 1; -cs->exception_index = EXCP_HLT; -cpu_loop_exit(cs); +case EXCP_SEMIHOSTING: +do_m68k_semihosting(env, env->dregs[0]); return; } } diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 8a194f2f21..8f61ff1238 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -26,12 +26,11 @@ #include "qemu/log.h" #include "qemu/qemu-print.h" #include "exec/translator.h" - #include "exec/helper-proto.h" #include "exec/helper-gen.h" - #include "exec/log.h" #include "fpu/softfloat.h" +#include "semihosting/semihost.h" #define HELPER_H "helper.h" #include "exec/helper-info.c.inc" @@ -1401,6 +1400,40 @@ static void gen_jmp_tb(DisasContext *s, int n, target_ulong dest, s->base.is_jmp = DISAS_NORETURN; } +#ifndef CONFIG_USER_ONLY +static bool semihosting_test(DisasContext *s) +{ +uint32_t test; + +if (!semihosting_enabled(IS_USER(s))) { +return false; +} + +/* + * "The semihosting instruction is immediately preceded by a + * nop aligned to a 4-byte boundary..." + * The preceding 2-byte (aligned) nop plus the 2-byte halt/bkpt + * means that we have advanced 4 bytes from the required nop. + */ +if (s->pc % 4 != 0) { +return false; +} +test = cpu_lduw_code(s->env, s->pc - 4); +if (test != 0x4e71) { +return false; +} +/* "... and followed by an invalid sentinel instruction movec %sp,0." */ +test = translator_ldl(s->env, >base, s->pc); +if (test != 0x4e7bf000) { +return false; +} + +/* Consume the sentinel. */ +s->pc += 4; +return true; +} +#endif /* !CONFIG_USER_ONLY */ + DISAS_INSN(scc) { DisasCompare c; @@ -4465,8 +4498,12 @@ DISAS_INSN(halt) gen_exception(s, s->base.pc_next, EXCP_PRIVILEGE); return; } - -gen_exception(s, s->pc, EXCP_HALT_INSN); +if (semihosting_test(s)) { +gen_exception(s, s->pc, EXCP_SEMIHOSTING); +return; +} +tcg_gen_movi_i32(cpu_halted, 1); +gen_exception(s, s->pc, EXCP_HLT); } DISAS_INSN(stop) -- 2.34.1
[PULL 29/35] util/bufferiszero: Remove AVX512 variant
From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amona...@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++--- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ -/* Begin with an unaligned head of 64 bytes. */ -__m512i t = _mm512_loadu_si512(buf); -__m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); -__m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - -/* Loop over 64-byte aligned blocks of 256. */ -while (p <= e) { -__builtin_prefetch(p); -if (unlikely(_mm512_test_epi64_mask(t, t))) { -return false; -} -t = p[-4] | p[-3] | p[-2] | p[-1]; -p += 4; -} - -t |= _mm512_loadu_si512(buf + len - 4 * 64); -t |= _mm512_loadu_si512(buf + len - 3 * 64); -t |= _mm512_loadu_si512(buf + len - 2 * 64); -t |= _mm512_loadu_si512(buf + len - 1 * 64); - -return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCELbuffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT -{ CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2,128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); -- 2.34.1
[PULL 15/35] target/m68k: Map FPU exceptions to FPSR register
From: Keith Packard Add helpers for reading/writing the 68881 FPSR register so that changes in floating point exception state can be seen by the application. Call these helpers in pre_load/post_load hooks to synchronize exception state. Signed-off-by: Keith Packard Reviewed-by: Richard Henderson Message-Id: <20230803035231.429697-1-kei...@keithp.com> Signed-off-by: Richard Henderson --- target/m68k/cpu.h| 3 +- target/m68k/helper.h | 2 ++ target/m68k/cpu.c| 12 +-- target/m68k/fpu_helper.c | 72 target/m68k/helper.c | 4 +-- target/m68k/translate.c | 4 +-- 6 files changed, 90 insertions(+), 7 deletions(-) diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h index 346427e144..e184239a81 100644 --- a/target/m68k/cpu.h +++ b/target/m68k/cpu.h @@ -199,7 +199,8 @@ void cpu_m68k_set_ccr(CPUM68KState *env, uint32_t); void cpu_m68k_set_sr(CPUM68KState *env, uint32_t); void cpu_m68k_restore_fp_status(CPUM68KState *env); void cpu_m68k_set_fpcr(CPUM68KState *env, uint32_t val); - +uint32_t cpu_m68k_get_fpsr(CPUM68KState *env); +void cpu_m68k_set_fpsr(CPUM68KState *env, uint32_t val); /* * Instead of computing the condition codes after each m68k instruction, diff --git a/target/m68k/helper.h b/target/m68k/helper.h index 2bbe0dc032..95aa5e53bb 100644 --- a/target/m68k/helper.h +++ b/target/m68k/helper.h @@ -54,6 +54,8 @@ DEF_HELPER_4(fsdiv, void, env, fp, fp, fp) DEF_HELPER_4(fddiv, void, env, fp, fp, fp) DEF_HELPER_4(fsgldiv, void, env, fp, fp, fp) DEF_HELPER_FLAGS_3(fcmp, TCG_CALL_NO_RWG, void, env, fp, fp) +DEF_HELPER_2(set_fpsr, void, env, i32) +DEF_HELPER_1(get_fpsr, i32, env) DEF_HELPER_FLAGS_2(set_fpcr, TCG_CALL_NO_RWG, void, env, i32) DEF_HELPER_FLAGS_2(ftst, TCG_CALL_NO_RWG, void, env, fp) DEF_HELPER_3(fconst, void, env, fp, i32) diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c index 7c8efbb42c..df49ff1880 100644 --- a/target/m68k/cpu.c +++ b/target/m68k/cpu.c @@ -390,12 +390,19 @@ static const VMStateDescription vmstate_freg = { } }; +static int fpu_pre_save(void *opaque) +{ +M68kCPU *s = opaque; + +s->env.fpsr = cpu_m68k_get_fpsr(>env); +return 0; +} + static int fpu_post_load(void *opaque, int version) { M68kCPU *s = opaque; -cpu_m68k_restore_fp_status(>env); - +cpu_m68k_set_fpsr(>env, s->env.fpsr); return 0; } @@ -404,6 +411,7 @@ const VMStateDescription vmmstate_fpu = { .version_id = 1, .minimum_version_id = 1, .needed = fpu_needed, +.pre_save = fpu_pre_save, .post_load = fpu_post_load, .fields = (const VMStateField[]) { VMSTATE_UINT32(env.fpcr, M68kCPU), diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c index ab120b5f59..8314791f50 100644 --- a/target/m68k/fpu_helper.c +++ b/target/m68k/fpu_helper.c @@ -164,6 +164,78 @@ void HELPER(set_fpcr)(CPUM68KState *env, uint32_t val) cpu_m68k_set_fpcr(env, val); } +/* Convert host exception flags to cpu_m68k form. */ +static int cpu_m68k_exceptbits_from_host(int host_bits) +{ +int target_bits = 0; + +if (host_bits & float_flag_invalid) { +target_bits |= 0x80; +} +if (host_bits & float_flag_overflow) { +target_bits |= 0x40; +} +if (host_bits & (float_flag_underflow | float_flag_output_denormal)) { +target_bits |= 0x20; +} +if (host_bits & float_flag_divbyzero) { +target_bits |= 0x10; +} +if (host_bits & float_flag_inexact) { +target_bits |= 0x08; +} +return target_bits; +} + +/* Convert cpu_m68k exception flags to target form. */ +static int cpu_m68k_exceptbits_to_host(int target_bits) +{ +int host_bits = 0; + +if (target_bits & 0x80) { +host_bits |= float_flag_invalid; +} +if (target_bits & 0x40) { +host_bits |= float_flag_overflow; +} +if (target_bits & 0x20) { +host_bits |= float_flag_underflow; +} +if (target_bits & 0x10) { +host_bits |= float_flag_divbyzero; +} +if (target_bits & 0x08) { +host_bits |= float_flag_inexact; +} +return host_bits; +} + +uint32_t cpu_m68k_get_fpsr(CPUM68KState *env) +{ +int host_flags = get_float_exception_flags(>fp_status); +int target_flags = cpu_m68k_exceptbits_from_host(host_flags); +int except = (env->fpsr & ~(0xf8)) | target_flags; +return except; +} + +uint32_t HELPER(get_fpsr)(CPUM68KState *env) +{ +return cpu_m68k_get_fpsr(env); +} + +void cpu_m68k_set_fpsr(CPUM68KState *env, uint32_t val) +{ +env->fpsr = val; + +int host_flags = cpu_m68k_exceptbits_to_host((int) env->fpsr); +set_float_exception_flags(host_flags, >fp_status); +} + +void HELPER(set_fpsr)(CPUM68KState *env, uint32_t val) +{ +cpu_m68k_set_fpsr(env, val); +} + #define PREC_BEGIN(prec)\ do {\ FloatX80RoundPrec old =
[PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec
Split out the tail of fold_neg to fold_neg_no_const so that we can avoid attempting to constant fold vector negate. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2150 Signed-off-by: Richard Henderson --- tcg/optimize.c| 17 - tests/tcg/aarch64/test-2150.c | 12 tests/tcg/aarch64/Makefile.target | 2 +- 3 files changed, 21 insertions(+), 10 deletions(-) create mode 100644 tests/tcg/aarch64/test-2150.c diff --git a/tcg/optimize.c b/tcg/optimize.c index 275db77b42..2e9e5725a9 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -1990,16 +1990,10 @@ static bool fold_nand(OptContext *ctx, TCGOp *op) return false; } -static bool fold_neg(OptContext *ctx, TCGOp *op) +static bool fold_neg_no_const(OptContext *ctx, TCGOp *op) { -uint64_t z_mask; - -if (fold_const1(ctx, op)) { -return true; -} - /* Set to 1 all bits to the left of the rightmost. */ -z_mask = arg_info(op->args[1])->z_mask; +uint64_t z_mask = arg_info(op->args[1])->z_mask; ctx->z_mask = -(z_mask & -z_mask); /* @@ -2010,6 +2004,11 @@ static bool fold_neg(OptContext *ctx, TCGOp *op) return true; } +static bool fold_neg(OptContext *ctx, TCGOp *op) +{ +return fold_const1(ctx, op) || fold_neg_no_const(ctx, op); +} + static bool fold_nor(OptContext *ctx, TCGOp *op) { if (fold_const2_commutative(ctx, op) || @@ -2418,7 +2417,7 @@ static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op) if (have_neg) { op->opc = neg_op; op->args[1] = op->args[2]; -return fold_neg(ctx, op); +return fold_neg_no_const(ctx, op); } return false; } diff --git a/tests/tcg/aarch64/test-2150.c b/tests/tcg/aarch64/test-2150.c new file mode 100644 index 00..fb86c11958 --- /dev/null +++ b/tests/tcg/aarch64/test-2150.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* See https://gitlab.com/qemu-project/qemu/-/issues/2150 */ + +int main() +{ +asm volatile( +"movi v6.4s, #1\n" +"movi v7.4s, #0\n" +"sub v6.2d, v7.2d, v6.2d\n" +: : : "v6", "v7"); +return 0; +} diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target index 0efd565f05..70d728ae9a 100644 --- a/tests/tcg/aarch64/Makefile.target +++ b/tests/tcg/aarch64/Makefile.target @@ -10,7 +10,7 @@ VPATH += $(AARCH64_SRC) # Base architecture tests AARCH64_TESTS=fcvt pcalign-a64 lse2-fault -AARCH64_TESTS += test-2248 +AARCH64_TESTS += test-2248 test-2150 fcvt: LDFLAGS+=-lm -- 2.34.1
[PULL 20/35] accel/tcg: Add insn_start to DisasContextBase
This is currently target-specific for many; begin making it target independent. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- include/exec/translator.h | 3 +++ accel/tcg/translator.c| 2 ++ 2 files changed, 5 insertions(+) diff --git a/include/exec/translator.h b/include/exec/translator.h index 51624feb10..ceaeca8c91 100644 --- a/include/exec/translator.h +++ b/include/exec/translator.h @@ -74,6 +74,8 @@ typedef enum DisasJumpType { * @singlestep_enabled: "Hardware" single stepping enabled. * @saved_can_do_io: Known value of cpu->neg.can_do_io, or -1 for unknown. * @plugin_enabled: TCG plugin enabled in this TB. + * @insn_start: The last op emitted by the insn_start hook, + * which is expected to be INDEX_op_insn_start. * * Architecture-agnostic disassembly context. */ @@ -87,6 +89,7 @@ typedef struct DisasContextBase { bool singlestep_enabled; int8_t saved_can_do_io; bool plugin_enabled; +struct TCGOp *insn_start; void *host_addr[2]; } DisasContextBase; diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c index 38c34009a5..ae61c154c2 100644 --- a/accel/tcg/translator.c +++ b/accel/tcg/translator.c @@ -140,6 +140,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, db->max_insns = *max_insns; db->singlestep_enabled = cflags & CF_SINGLE_STEP; db->saved_can_do_io = -1; +db->insn_start = NULL; db->host_addr[0] = host_pc; db->host_addr[1] = NULL; @@ -157,6 +158,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns, while (true) { *max_insns = ++db->num_insns; ops->insn_start(db, cpu); +db->insn_start = tcg_last_op(); tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */ if (plugin_enabled) { -- 2.34.1
[PULL 09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0
The contents of IIAOQ depend on PSW_W. Follow the text in "Interruption Instruction Address Queues", pages 2-13 through 2-15. Tested-by: Sven Schnelle Tested-by: Helge Deller Reported-by: Sven Schnelle Fixes: b10700d826c ("target/hppa: Update IIAOQ, IIASQ for pa2.0") Signed-off-by: Richard Henderson --- target/hppa/int_helper.c | 20 +++- target/hppa/sys_helper.c | 18 +- 2 files changed, 20 insertions(+), 18 deletions(-) diff --git a/target/hppa/int_helper.c b/target/hppa/int_helper.c index 90437a92cd..a667ee380d 100644 --- a/target/hppa/int_helper.c +++ b/target/hppa/int_helper.c @@ -107,14 +107,10 @@ void hppa_cpu_do_interrupt(CPUState *cs) /* step 3 */ /* - * For pa1.x, IIASQ is simply a copy of IASQ. - * For pa2.0, IIASQ is the top bits of the virtual address, - *or zero if translation is disabled. + * IIASQ is the top bits of the virtual address, or zero if translation + * is disabled -- with PSW_W == 0, this will reduce to the space. */ -if (!hppa_is_pa20(env)) { -env->cr[CR_IIASQ] = env->iasq_f >> 32; -env->cr_back[0] = env->iasq_b >> 32; -} else if (old_psw & PSW_C) { +if (old_psw & PSW_C) { env->cr[CR_IIASQ] = hppa_form_gva_psw(old_psw, env->iasq_f, env->iaoq_f) >> 32; env->cr_back[0] = @@ -123,8 +119,14 @@ void hppa_cpu_do_interrupt(CPUState *cs) env->cr[CR_IIASQ] = 0; env->cr_back[0] = 0; } -env->cr[CR_IIAOQ] = env->iaoq_f; -env->cr_back[1] = env->iaoq_b; +/* IIAOQ is the full offset for wide mode, or 32 bits for narrow mode. */ +if (old_psw & PSW_W) { +env->cr[CR_IIAOQ] = env->iaoq_f; +env->cr_back[1] = env->iaoq_b; +} else { +env->cr[CR_IIAOQ] = (uint32_t)env->iaoq_f; +env->cr_back[1] = (uint32_t)env->iaoq_b; +} if (old_psw & PSW_Q) { /* step 5 */ diff --git a/target/hppa/sys_helper.c b/target/hppa/sys_helper.c index 208e51c086..22d6c89964 100644 --- a/target/hppa/sys_helper.c +++ b/target/hppa/sys_helper.c @@ -78,21 +78,21 @@ target_ulong HELPER(swap_system_mask)(CPUHPPAState *env, target_ulong nsm) void HELPER(rfi)(CPUHPPAState *env) { -env->iasq_f = (uint64_t)env->cr[CR_IIASQ] << 32; -env->iasq_b = (uint64_t)env->cr_back[0] << 32; -env->iaoq_f = env->cr[CR_IIAOQ]; -env->iaoq_b = env->cr_back[1]; +uint64_t mask; + +cpu_hppa_put_psw(env, env->cr[CR_IPSW]); /* * For pa2.0, IIASQ is the top bits of the virtual address. * To recreate the space identifier, remove the offset bits. + * For pa1.x, the mask reduces to no change to space. */ -if (hppa_is_pa20(env)) { -env->iasq_f &= ~env->iaoq_f; -env->iasq_b &= ~env->iaoq_b; -} +mask = gva_offset_mask(env->psw); -cpu_hppa_put_psw(env, env->cr[CR_IPSW]); +env->iaoq_f = env->cr[CR_IIAOQ]; +env->iaoq_b = env->cr_back[1]; +env->iasq_f = (env->cr[CR_IIASQ] << 32) & ~(env->iaoq_f & mask); +env->iasq_b = (env->cr_back[0] << 32) & ~(env->iaoq_b & mask); } static void getshadowregs(CPUHPPAState *env) -- 2.34.1
[PULL 02/35] linux-user: Fix waitid return of siginfo_t and rusage
The copy back to siginfo_t should be conditional only on arg3, not the specific values that might have been written. The copy back to rusage was missing entirely. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2262 Signed-off-by: Richard Henderson --- linux-user/syscall.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index e12d969c2e..3df2b94d9a 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -9272,14 +9272,24 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, #ifdef TARGET_NR_waitid case TARGET_NR_waitid: { +struct rusage ru; siginfo_t info; -info.si_pid = 0; -ret = get_errno(safe_waitid(arg1, arg2, , arg4, NULL)); -if (!is_error(ret) && arg3 && info.si_pid != 0) { -if (!(p = lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0))) + +ret = get_errno(safe_waitid(arg1, arg2, (arg3 ? : NULL), +arg4, (arg5 ? : NULL))); +if (!is_error(ret)) { +if (arg3) { +p = lock_user(VERIFY_WRITE, arg3, + sizeof(target_siginfo_t), 0); +if (!p) { +return -TARGET_EFAULT; +} +host_to_target_siginfo(p, ); +unlock_user(p, arg3, sizeof(target_siginfo_t)); +} +if (arg5 && host_to_target_rusage(arg5, )) { return -TARGET_EFAULT; -host_to_target_siginfo(p, ); -unlock_user(p, arg3, sizeof(target_siginfo_t)); +} } } return ret; -- 2.34.1