Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
24.04.2020 21:41, Alberto Garcia wrote: On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy wrote: AFAIK, now compressed clusters can't be used in scenarios with guest, as qcow2 driver doesn't support rewriting them. You can write to those images just fine, it's just not efficient because you have to COW the compressed clusters. No, rewriting doesn't work: [root@kvm master]# ./qemu-img create -f qcow2 x 10M Formatting 'x', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16 [root@kvm master]# ./qemu-io -c 'write -c 0 64K' x wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 00.23 sec (278.708 KiB/sec and 4.3548 ops/sec) [root@kvm master]# ./qemu-io -c 'write -c 0 64K' x write failed: Input/output error Or am I wrong? And we normally don't combine normal and compressed clusters together in one image. As soon as you start writing to an image with compressed clusters you'll have a combination of both. Ah, you mean, rewriting compressed clusters by normal.. So the use-case is just take compressed backup image and use it for the guest, instead of converting first.. It makes sense. But it's true that you don't have an image with compressed clusters if what you're looking for is performance. So I wouldn't add support for this if it complicates things too much. Berto -- Best regards, Vladimir
Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX
On 4/24/20 3:47 PM, Stephen Long wrote: > Oh, maybe I misread the manual description for SVE2 TBL, but I thought Zm was > the indexes register and the loop compares the index from Zm with the total > number of elems, table_elems. That's right. You take the index from Zm just fine, but fail to apply that index properly across Zn and Zn+1. r~ > > -Original Message- > From: Richard Henderson > Sent: Friday, April 24, 2020 2:37 PM > To: Stephen Long ; qemu-devel@nongnu.org > Cc: qemu-...@nongnu.org; Ana Pazos > Subject: [EXT] Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX > > On 4/23/20 9:42 AM, Stephen Long wrote: >> Signed-off-by: Stephen Long >> >> These insns don't show up under any SVE2 categories in the manual. But >> if you lookup each insn, you'll find they have SVE2 variants. >> --- >> target/arm/helper-sve.h| 10 +++ >> target/arm/sve.decode | 5 >> target/arm/sve_helper.c| 53 ++ >> target/arm/translate-sve.c | 20 ++ >> 4 files changed, 88 insertions(+) >> >> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index >> f6ae814021..54d20575e8 100644 >> --- a/target/arm/helper-sve.h >> +++ b/target/arm/helper-sve.h >> @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, >> TCG_CALL_NO_RWG, >> void, ptr, ptr, ptr, ptr, i32) >> DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, >> void, ptr, ptr, ptr, ptr, i32) >> + >> +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, >> +ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, >> +ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, >> +void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d, >> +TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) >> + >> +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, >> +i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, >> +ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, >> +ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, >> +ptr, ptr, ptr, i32) >> diff --git a/target/arm/sve.decode b/target/arm/sve.decode index >> 3a2a4a7f1c..483fbf0dcc 100644 >> --- a/target/arm/sve.decode >> +++ b/target/arm/sve.decode >> @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . >> . @rda_rn_rm >> >> CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx >> SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 >> ra=%reg_movprfx >> + >> +### SVE2 Table Lookup (three sources) >> + >> +TBL_zzz 0101 .. 1 . 00101 0 . . @rd_rn_rm >> +TBX_zzz 0101 .. 1 . 00101 1 . . @rd_rn_rm >> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index >> 55e2c32f03..d1e91da02a 100644 >> --- a/target/arm/sve_helper.c >> +++ b/target/arm/sve_helper.c >> @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, ) >> >> #undef TBL >> >> +#define DO_SVE2_TBL(NAME, TYPE, H) \ >> +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc) >> \ >> +{ >> \ >> +intptr_t i, opr_sz = simd_oprsz(desc); >> \ >> +uintptr_t elem = opr_sz / sizeof(TYPE); >> \ >> +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm; >> \ >> +ARMVectorReg tmp1, tmp2; >> \ > > Only one temp needed. > >> +if (unlikely(vd == vn1)) { >> \ >> +n1 = memcpy(&tmp1, vn1, opr_sz); >> \ >> +} >> \ >> +if (unlikely(vd == vn2)) { >> \ >> +n2 = memcpy(&tmp2, vn2, opr_sz); >> \ >> +} > > Better with else if here. > Because vd cannot overlap both vn1 or vn2, only one of them. > \ >> +for (i = 0; i < elem; i++) { >> \ >> +TYPE j = m[H(i)]; >> \ >> +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0; >> \ >> + >> \ >> +TYPE k = m[H(elem + i)]; >> \ >> +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; >> \ >> +} > > First, the indexing is wrong. > > Note that you're range checking vs elem * 2, but only indexing a single > vector. > Thus you must be indexing into the next vector. > > This should look more like > > TYPE j = m[H(i)]; > TYPE r = 0; > > if (j <
RE: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX
Oh, maybe I misread the manual description for SVE2 TBL, but I thought Zm was the indexes register and the loop compares the index from Zm with the total number of elems, table_elems. -Original Message- From: Richard Henderson Sent: Friday, April 24, 2020 2:37 PM To: Stephen Long ; qemu-devel@nongnu.org Cc: qemu-...@nongnu.org; Ana Pazos Subject: [EXT] Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX On 4/23/20 9:42 AM, Stephen Long wrote: > Signed-off-by: Stephen Long > > These insns don't show up under any SVE2 categories in the manual. But > if you lookup each insn, you'll find they have SVE2 variants. > --- > target/arm/helper-sve.h| 10 +++ > target/arm/sve.decode | 5 > target/arm/sve_helper.c| 53 ++ > target/arm/translate-sve.c | 20 ++ > 4 files changed, 88 insertions(+) > > diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index > f6ae814021..54d20575e8 100644 > --- a/target/arm/helper-sve.h > +++ b/target/arm/helper-sve.h > @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, > TCG_CALL_NO_RWG, > void, ptr, ptr, ptr, ptr, i32) > DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, > void, ptr, ptr, ptr, ptr, i32) > + > +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, > +ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, > +ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, > +void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d, > +TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) > + > +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, > +i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, > +ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, > +ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, > +ptr, ptr, ptr, i32) > diff --git a/target/arm/sve.decode b/target/arm/sve.decode index > 3a2a4a7f1c..483fbf0dcc 100644 > --- a/target/arm/sve.decode > +++ b/target/arm/sve.decode > @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . > @rda_rn_rm > > CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx > SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 > ra=%reg_movprfx > + > +### SVE2 Table Lookup (three sources) > + > +TBL_zzz 0101 .. 1 . 00101 0 . . @rd_rn_rm > +TBX_zzz 0101 .. 1 . 00101 1 . . @rd_rn_rm > diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index > 55e2c32f03..d1e91da02a 100644 > --- a/target/arm/sve_helper.c > +++ b/target/arm/sve_helper.c > @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, ) > > #undef TBL > > +#define DO_SVE2_TBL(NAME, TYPE, H) \ > +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc) \ > +{ \ > +intptr_t i, opr_sz = simd_oprsz(desc); \ > +uintptr_t elem = opr_sz / sizeof(TYPE); \ > +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm;\ > +ARMVectorReg tmp1, tmp2;\ Only one temp needed. > +if (unlikely(vd == vn1)) { \ > +n1 = memcpy(&tmp1, vn1, opr_sz);\ > +} \ > +if (unlikely(vd == vn2)) { \ > +n2 = memcpy(&tmp2, vn2, opr_sz);\ > +} Better with else if here. Because vd cannot overlap both vn1 or vn2, only one of them. \ > +for (i = 0; i < elem; i++) {\ > +TYPE j = m[H(i)]; \ > +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0;\ > +\ > +TYPE k = m[H(elem + i)];\ > +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; \ > +} First, the indexing is wrong. Note that you're range checking vs elem * 2, but only indexing a single vector. Thus you must be indexing into the next vector. This should look more like TYPE j = m[H(i)]; TYPE r = 0; if (j < elem) { r = n1[H(j)]; } else if (j < 2 * elem) { r = n2[H(j - elem)]; } d[H(i)] = r; Second, this is one case where I'd prefer to share code with AArch64. It would be worthwhile to rearrange both sve1 and advsimd to use a common set of helpers. > +static bool trans_TBL_zzz(DisasContext *s, arg_rrr_esz *a) _zzz is not helpful here.
Re: [PATCH RFC] target/arm: Implement SVE2 SPLICE, EXT
On 4/23/20 11:03 AM, Stephen Long wrote: > Signed-off-by: Stephen Long > > I'm not sure I can just use the SVE helper functions for the SVE2 > variants of EXT and SPLICE. Absolutely. The new insns are functionally identical to movprfx + the old insn. I presume the new insns were added for code density reasons, but I don't know for sure. > +SPLICE_zpz 0101 .. 101 101 100 ... . . @rd_pg_rn > +EXT_zzi 0101 011 . 000 ... rn:5 rd:5 \ I have renamed these to SPLICE_sve2 and EXT_sve2, since the same suffix applies to the original too. Applied to my SVE2 branch. Thanks! r~
[PATCH] linux-user: return target error codes for socket() and prctl()
Return target error codes instead of host error codes. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..655a86fa45 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2987,7 +2987,7 @@ static abi_long do_socket(int domain, int type, int protocol) #endif protocol == NETLINK_KOBJECT_UEVENT || protocol == NETLINK_AUDIT)) { -return -EPFNOSUPPORT; +return -TARGET_EPFNOSUPPORT; } if (domain == AF_PACKET || @@ -5856,7 +5856,7 @@ static abi_long do_get_thread_area(CPUX86State *env, abi_ulong ptr) abi_long do_arch_prctl(CPUX86State *env, int code, abi_ulong addr) { -return -ENOSYS; +return -TARGET_ENOSYS; } #else abi_long do_arch_prctl(CPUX86State *env, int code, abi_ulong addr)
Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall
On 4/24/20 4:47 PM, Helge Deller wrote: - host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); + int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); I don't think this is quite correct. target_to_host_bitmask() silently ignores unknown bits, and a user that was relying on bit 0x4000 to cause an EINVAL will not fail with this change (unless bit 0x4000 happens to be one of the bits translated by fcntl_flags_tbl). True. The open() syscall is notorious for ignoring unknown bits rather than failing with EINVAL, and it is has come back to haunt kernel developers; newer syscalls like dup3() learned from the mistake, and we really do want to catch unsupported bits up to make it easier for future kernels to define meanings to those bits without them being silently swallowed when run on older systems that did not know what those bits meant. Ok, I wasn't aware that it's a design goal to manually find such cases of wrong userspace applications. But in this case, you're right that my patch shouldn't be applied. This, and several similar ones that you also posted. Maybe you could add a new int target_to_host_bitmask_strict(int src, translate_tbl, int *dst), which returns 0 when *dst is bit-for-bit translated from src, and returns -1 if src had bits not specified by translate_tbl. In that case, the caller can then translate all usual bits and rely on the syscall() failure (as you tried here), but you can also flag -TARGET_EINVAL up front for bits not covered by the table. While looking at the code I just noticed another bug too, which needs fixing then: -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { -return -EINVAL; this needs to be: -return -TARGET_EINVAL; Indeed. Good catch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] linux-user: Drop open-coded fcntl flags conversion in eventfd2 syscall
On 4/24/20 1:48 PM, Helge Deller wrote: > Drop the open-coded fcntl flags conversion in the eventfd2 syscall and > replace it with the built-in conversion with fcntl_flags_tbl. > > Signed-off-by: Helge Deller Reviewed-by: Richard Henderson r~
Re: [PATCH] linux-user: Add support for /proc/cpuinfo on hppa platform
On 4/24/20 2:06 PM, Helge Deller wrote: > Provide an own /proc/cpuinfo file for the hppa (parisc) platform. "our own" perhaps? > Signed-off-by: Helge Deller Reviewed-by: Richard Henderson r~
Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall
On 24.04.20 23:32, Eric Blake wrote: > On 4/24/20 3:57 PM, Helge Deller wrote: >> Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka >> O_CLOEXEC) was given. Instead simply rely on any error codes returned by >> the host dup3() syscall. >> >> Signed-off-by: Helge Deller >> >> diff --git a/linux-user/syscall.c b/linux-user/syscall.c >> index 05f03919ff..ebf0d38321 100644 >> --- a/linux-user/syscall.c >> +++ b/linux-user/syscall.c >> @@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, >> abi_long arg1, >> #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3) >> case TARGET_NR_dup3: >> { >> - int host_flags; >> - >> - if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { >> - return -EINVAL; >> - } >> - host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); >> + int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); > > I don't think this is quite correct. target_to_host_bitmask() > silently ignores unknown bits, and a user that was relying on bit > 0x4000 to cause an EINVAL will not fail with this change (unless > bit 0x4000 happens to be one of the bits translated by > fcntl_flags_tbl). True. > The open() syscall is notorious for ignoring unknown bits rather than > failing with EINVAL, and it is has come back to haunt kernel > developers; newer syscalls like dup3() learned from the mistake, and > we really do want to catch unsupported bits up to make it easier for > future kernels to define meanings to those bits without them being > silently swallowed when run on older systems that did not know what > those bits meant. Ok, I wasn't aware that it's a design goal to manually find such cases of wrong userspace applications. But in this case, you're right that my patch shouldn't be applied. While looking at the code I just noticed another bug too, which needs fixing then: >> -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { >> -return -EINVAL; this needs to be: >> -return -TARGET_EINVAL; Helge
Re: [PATCH RFC 1/3] target/arm: Implement SVE2 AESMC, AESIMC
On 4/23/20 3:37 PM, Stephen Long wrote: > +#define DO_CRYPTO_AESMC(NAME, DECRYPT) \ > +void HELPER(NAME)(void *vd, void *vn, uint32_t desc)\ > +{ \ > +intptr_t i, opr_sz = simd_oprsz(desc); \ > +void *d = vd, *n = vn; \ > +for (i = 0; i < opr_sz; i += 16) { \ > +HELPER(crypto_aesmc)(vd + i, vn + i, DECRYPT); \ > +} > +} Better, IMO, is to add a "desc" parameter to crypto_aesmc and move the loop there. Then all variants of this operation use the exact same helper. The separate decrypt parameter would become simd_data(). Same for the other two patches in this series. r~
Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX
On 4/23/20 9:42 AM, Stephen Long wrote: > Signed-off-by: Stephen Long > > These insns don't show up under any SVE2 categories in the manual. But > if you lookup each insn, you'll find they have SVE2 variants. > --- > target/arm/helper-sve.h| 10 +++ > target/arm/sve.decode | 5 > target/arm/sve_helper.c| 53 ++ > target/arm/translate-sve.c | 20 ++ > 4 files changed, 88 insertions(+) > > diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h > index f6ae814021..54d20575e8 100644 > --- a/target/arm/helper-sve.h > +++ b/target/arm/helper-sve.h > @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, > TCG_CALL_NO_RWG, > void, ptr, ptr, ptr, ptr, i32) > DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, > void, ptr, ptr, ptr, ptr, i32) > + > +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, > i32) > +DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, > i32) > +DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, > i32) > +DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, > i32) > + > +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) > +DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) > +DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) > +DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) > diff --git a/target/arm/sve.decode b/target/arm/sve.decode > index 3a2a4a7f1c..483fbf0dcc 100644 > --- a/target/arm/sve.decode > +++ b/target/arm/sve.decode > @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . > @rda_rn_rm > > CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx > SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx > + > +### SVE2 Table Lookup (three sources) > + > +TBL_zzz 0101 .. 1 . 00101 0 . . @rd_rn_rm > +TBX_zzz 0101 .. 1 . 00101 1 . . @rd_rn_rm > diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c > index 55e2c32f03..d1e91da02a 100644 > --- a/target/arm/sve_helper.c > +++ b/target/arm/sve_helper.c > @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, ) > > #undef TBL > > +#define DO_SVE2_TBL(NAME, TYPE, H) \ > +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc) \ > +{ \ > +intptr_t i, opr_sz = simd_oprsz(desc); \ > +uintptr_t elem = opr_sz / sizeof(TYPE); \ > +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm;\ > +ARMVectorReg tmp1, tmp2;\ Only one temp needed. > +if (unlikely(vd == vn1)) { \ > +n1 = memcpy(&tmp1, vn1, opr_sz);\ > +} \ > +if (unlikely(vd == vn2)) { \ > +n2 = memcpy(&tmp2, vn2, opr_sz);\ > +} Better with else if here. Because vd cannot overlap both vn1 or vn2, only one of them. \ > +for (i = 0; i < elem; i++) {\ > +TYPE j = m[H(i)]; \ > +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0;\ > +\ > +TYPE k = m[H(elem + i)];\ > +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; \ > +} First, the indexing is wrong. Note that you're range checking vs elem * 2, but only indexing a single vector. Thus you must be indexing into the next vector. This should look more like TYPE j = m[H(i)]; TYPE r = 0; if (j < elem) { r = n1[H(j)]; } else if (j < 2 * elem) { r = n2[H(j - elem)]; } d[H(i)] = r; Second, this is one case where I'd prefer to share code with AArch64. It would be worthwhile to rearrange both sve1 and advsimd to use a common set of helpers. > +static bool trans_TBL_zzz(DisasContext *s, arg_rrr_esz *a) _zzz is not helpful here. The SVE1 insn also operates on 3 registers, and thus could logically be _zzz too. Better might be _double, after double_table = TRUE, or maybe just _2 just in case SVE3 adds a variant with more table registers. r~
Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall
On 4/24/20 3:57 PM, Helge Deller wrote: Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka O_CLOEXEC) was given. Instead simply rely on any error codes returned by the host dup3() syscall. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..ebf0d38321 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3) case TARGET_NR_dup3: { -int host_flags; - -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { -return -EINVAL; -} -host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); +int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); I don't think this is quite correct. target_to_host_bitmask() silently ignores unknown bits, and a user that was relying on bit 0x4000 to cause an EINVAL will not fail with this change (unless bit 0x4000 happens to be one of the bits translated by fcntl_flags_tbl). The open() syscall is notorious for ignoring unknown bits rather than failing with EINVAL, and it is has come back to haunt kernel developers; newer syscalls like dup3() learned from the mistake, and we really do want to catch unsupported bits up to make it easier for future kernels to define meanings to those bits without them being silently swallowed when run on older systems that did not know what those bits meant. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [RFC patch v1 2/3] qemu-file: add buffered mode
On 4/13/20 6:12 AM, Denis Plotnikov wrote: The patch adds ability to qemu-file to write the data asynchronously to improve the performance on writing. Before, only synchronous writing was supported. Enabling of the asyncronous mode is managed by new asynchronous "enabled_buffered" callback. The term "enabled_buffered" does not appear in the patch. Did you mean... Signed-off-by: Denis Plotnikov --- include/qemu/typedefs.h | 1 + migration/qemu-file.c | 351 +--- migration/qemu-file.h | 9 ++ 3 files changed, 339 insertions(+), 22 deletions(-) @@ -60,6 +66,22 @@ struct QEMUFile { bool shutdown; /* currently used buffer */ QEMUFileBuffer *current_buf; +/* + * with buffered_mode enabled all the data copied to 512 byte + * aligned buffer, including iov data. Then the buffer is passed + * to writev_buffer callback. + */ +bool buffered_mode; ..."Asynchronous mode is managed by setting the new buffered_mode flag"? ... +/* for async buffer writing */ +AioTaskPool *pool; +/* the list of free buffers, currently used on is NOT there */ s/on/one/ +QLIST_HEAD(, QEMUFileBuffer) free_buffers; +}; + +struct QEMUFileAioTask { +AioTask task; +QEMUFile *f; +QEMUFileBuffer *fb; }; /* @@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops) f->opaque = opaque; f->ops = ops; -f->current_buf = g_new0(QEMUFileBuffer, 1); -f->current_buf->buf = g_malloc(IO_BUF_SIZE); -f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE); -f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE); +if (f->ops->enable_buffered) { +f->buffered_mode = f->ops->enable_buffered(f->opaque); ...ah, you meant 'enable_buffered'. But still, why do we need a callback function? Is it not sufficient to just have a bool flag? +static size_t get_buf_free_size(QEMUFile *f) +{ +QEMUFileBuffer *fb = f->current_buf; +/* buf_index can't be greated than buf_size */ greater +assert(fb->buf_size >= fb->buf_index); +return fb->buf_size - fb->buf_index; +} + +static int write_task_fn(AioTask *task) +{ +/* + * Increment file position. + * This needs to be here before calling writev_buffer, because + * writev_buffer is asynchronous and there could be more than one + * writev_buffer started simultaniously. Each writev_buffer should simultaneously + * use its own file pos to write to. writev_buffer may write less + * than buf_index bytes but we treat this situation as an error. + * If error appeared, further file using is meaningless. s/using/use/ + * We expect that, the most of the time the full buffer is written, + * (when buf_size == buf_index). The only case when the non-full + * buffer is written (buf_size != buf_index) is file close, + * when we need to flush the rest of the buffer content. We expect that most of the time, the full buffer will be written (buf_size == buf_index), with the exception at file close where we need to flush the final partial buffer. + */ +f->pos += fb->buf_index; + +ret = f->ops->writev_buffer(f->opaque, &v, 1, pos, &local_error); + +/* return the just written buffer to the free list */ +QLIST_INSERT_HEAD(&f->free_buffers, fb, link); + +/* check that we have written everything */ +if (ret != fb->buf_index) { +qemu_file_set_error_obj(f, ret < 0 ? ret : -EIO, local_error); +} + +/* + * always return 0 - don't use task error handling, relay on rely + * qemu file error handling + */ +return 0; +} + +static void qemu_file_switch_current_buf(QEMUFile *f) +{ +/* + * if the list is empty, wait until some task returns a buffer + * to the list of free buffers. + */ +if (QLIST_EMPTY(&f->free_buffers)) { +aio_task_pool_wait_slot(f->pool); +} + +/* + * sanity check that the list isn't empty + * if the free list was empty, we waited for a task complition, completion + * and the pompleted task must return a buffer to a list of free buffers completed + */ +assert(!QLIST_EMPTY(&f->free_buffers)); + +/* set the current buffer for using from the free list */ +f->current_buf = QLIST_FIRST(&f->free_buffers); +reset_buf(f); + +QLIST_REMOVE(f->current_buf, link); +} + /* + * Copy an external buffer to the intenal current buffer. internal + */ +static void copy_buf(QEMUFile *f, const uint8_t *buf, size_t size, + bool may_free) +{ +++ b/migration/qemu-file.h @@ -103,6 +103,14 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque); typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr, Error **errp); +/* + * Enables or disables the buffered mode + * Existing blocking reads/writes must be woken + * Returns true if the
Re: [RFC patch v1 1/3] qemu-file: introduce current buffer
On 4/13/20 6:12 AM, Denis Plotnikov wrote: To approach async wrtiting in the further commits, the buffer writing allocated in QEMUFile struct is replaced with the link to the current buffer. We're going to use many buffers to write the qemu file stream to the unerlying storage asynchronously. The underlying current buffer points out to the buffer is currently filled with data. This sentence is confusing. I'd suggest: The current_buf pointer tracks which buffer is currently filled with data. This patch doesn't add any features to qemu-file and doesn't change any qemu-file behavior. Signed-off-by: Denis Plotnikov --- include/qemu/typedefs.h | 1 + migration/qemu-file.c | 156 +--- 2 files changed, 95 insertions(+), 62 deletions(-) diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index 375770a..88dce54 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -97,6 +97,7 @@ typedef struct QDict QDict; typedef struct QEMUBH QEMUBH; typedef struct QemuConsole QemuConsole; typedef struct QEMUFile QEMUFile; +typedef struct QEMUFileBuffer QEMUFileBuffer; typedef struct QemuLockable QemuLockable; typedef struct QemuMutex QemuMutex; typedef struct QemuOpt QemuOpt; diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 1c3a358..285c6ef 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -33,6 +33,17 @@ #define IO_BUF_SIZE 32768 #define MAX_IOV_SIZE MIN(IOV_MAX, 64) +QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512)); + +struct QEMUFileBuffer { +int buf_index; +int buf_size; /* 0 when writing */ +uint8_t *buf; +unsigned long *may_free; Do we really want something that compiles differently on 32- vs. 64-bit? /me reads ahead... +struct iovec *iov; +unsigned int iovcnt; +}; + struct QEMUFile { const QEMUFileOps *ops; const QEMUFileHooks *hooks; @@ -43,18 +54,12 @@ struct QEMUFile { int64_t pos; /* start of buffer when writing, end of buffer when reading */ -int buf_index; -int buf_size; /* 0 when writing */ -uint8_t buf[IO_BUF_SIZE]; - -DECLARE_BITMAP(may_free, MAX_IOV_SIZE); ...ah, you're replacing a bitmap, and our bitmap code _does_ use 'long' as its core for optimum speed (which overcomes the fact that it does mean annoying differences on 32- vs. 64-bit). -struct iovec iov[MAX_IOV_SIZE]; -unsigned int iovcnt; - int last_error; Error *last_error_obj; /* has the file has been shutdown */ bool shutdown; +/* currently used buffer */ +QEMUFileBuffer *current_buf; }; Most of the patch is mechanical conversion to the rearranged struct. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[PATCH] linux-user: Add support for /proc/cpuinfo on hppa platform
Provide an own /proc/cpuinfo file for the hppa (parisc) platform. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..ebf0d38321 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7438,6 +7435,18 @@ static int open_cpuinfo(void *cpu_env, int fd) } #endif +#if defined(TARGET_HPPA) +static int open_cpuinfo(void *cpu_env, int fd) +{ +dprintf(fd, "cpu family\t: PA-RISC 1.1e\n"); +dprintf(fd, "cpu\t\t: PA7300LC (PCX-L2)\n"); +dprintf(fd, "capabilities\t: os32\n"); +dprintf(fd, "model\t\t: 9000/778/B160L\n"); +dprintf(fd, "model name\t: Merlin L2 160 QEMU (9000/778/B160L)\n"); +return 0; +} +#endif + #if defined(TARGET_M68K) static int open_hardware(void *cpu_env, int fd) { @@ -7462,7 +7471,7 @@ static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags, #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN) { "/proc/net/route", open_net_route, is_proc }, #endif -#if defined(TARGET_SPARC) +#if defined(TARGET_SPARC) || defined(TARGET_HPPA) { "/proc/cpuinfo", open_cpuinfo, is_proc }, #endif #if defined(TARGET_M68K)
[PATCH] linux-user: Drop unnecessary check in signalfd4 syscall
The signalfd4() syscall takes optional O_NONBLOCK and O_CLOEXEC fcntl flags. If the user gave any other invalid flags, the host syscall will return correct error codes, so simply drop the extra check here. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..ebf0d38321 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -7176,9 +7176,6 @@ static abi_long do_signalfd4(int fd, abi_long mask, int flags) sigset_t host_mask; abi_long ret; -if (flags & ~(TARGET_O_NONBLOCK | TARGET_O_CLOEXEC)) { -return -TARGET_EINVAL; -} if (!lock_user_struct(VERIFY_READ, target_mask, mask, 1)) { return -TARGET_EFAULT; }
[PATCH] linux-user: Drop unnecessary check in dup3 syscall
Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka O_CLOEXEC) was given. Instead simply rely on any error codes returned by the host dup3() syscall. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..ebf0d38321 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3) case TARGET_NR_dup3: { -int host_flags; - -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) { -return -EINVAL; -} -host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); +int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl); ret = get_errno(dup3(arg1, arg2, host_flags)); if (ret >= 0) { fd_trans_dup(arg1, arg2);
[PATCH] linux-user: Drop open-coded fcntl flags conversion in eventfd2 syscall
Drop the open-coded fcntl flags conversion in the eventfd2 syscall and replace it with the built-in conversion with fcntl_flags_tbl. Signed-off-by: Helge Deller diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 05f03919ff..ebf0d38321 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -11938,13 +11942,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1, #if defined(TARGET_NR_eventfd2) case TARGET_NR_eventfd2: { -int host_flags = arg2 & (~(TARGET_O_NONBLOCK | TARGET_O_CLOEXEC)); -if (arg2 & TARGET_O_NONBLOCK) { -host_flags |= O_NONBLOCK; -} -if (arg2 & TARGET_O_CLOEXEC) { -host_flags |= O_CLOEXEC; -} +int host_flags = target_to_host_bitmask(arg2, fcntl_flags_tbl); ret = get_errno(eventfd(arg1, host_flags)); if (ret >= 0) { fd_trans_register(ret, &target_eventfd_trans);
[Bug 1874904] [NEW] qemu windows with gtk and french keypad not working as expected
Public bug reported: When I use qemu on Windows 10 with a French AZERTY keypad with the correct options the emulated system keypad still QWERTY. If I use sdl it works fine the emulated keypad is AZERTY. Example of command with ubuntu live cd: -> qemu-system-x86_64.exe -m 4G ubuntu-18.04.4-desktop-amd64.iso -display gtk -k fr NOTES: - Using the same command on Linux works fine. The emulated keypad is AZERTY. Qemu version : 4.2.0 on Windows 10 ** Affects: qemu Importance: Undecided Status: New ** Description changed: When I use qemu on Windows 10 with a French AZERTY keypad with the - correct options the emulated system keypad still QWERTY. If we use sdl - it works fine the emulated keypad is AZERTY. + correct options the emulated system keypad still QWERTY. If I use sdl it + works fine the emulated keypad is AZERTY. Example of command with ubuntu live cd: -> qemu-system-x86_64.exe -m 4G ubuntu-18.04.4-desktop-amd64.iso -display gtk -k fr NOTES: - - Using the same command on Linux works fine. The emulated keypad is AZERTY. + - Using the same command on Linux works fine. The emulated keypad is AZERTY. Qemu version : 4.2.0 on Windows 10 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1874904 Title: qemu windows with gtk and french keypad not working as expected Status in QEMU: New Bug description: When I use qemu on Windows 10 with a French AZERTY keypad with the correct options the emulated system keypad still QWERTY. If I use sdl it works fine the emulated keypad is AZERTY. Example of command with ubuntu live cd: -> qemu-system-x86_64.exe -m 4G ubuntu-18.04.4-desktop-amd64.iso -display gtk -k fr NOTES: - Using the same command on Linux works fine. The emulated keypad is AZERTY. Qemu version : 4.2.0 on Windows 10 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1874904/+subscriptions
Re: [PATCH v3] target/arm: Implement SVE2 FMMLA
On 4/22/20 9:55 AM, Stephen Long wrote: > +intptr_t opr_sz = simd_oprsz(desc) / (sizeof(TYPE) >> 2); \ > +\ > +for (s = 0; s < opr_sz; ++s) { \ > +TYPE *n = vn + s * (sizeof(TYPE) >> 2); \ > +TYPE *m = vm + s * (sizeof(TYPE) >> 2); \ > +TYPE *a = va + s * (sizeof(TYPE) >> 2); \ > +TYPE *d = vd + s * (sizeof(TYPE) >> 2); \ Shifting the wrong way. Need to multiply by 4 not divide. I've fixed this up, and also expanded the macro to two functions; I think it's clearer that way in this case. Applied to my SVE2 branch. Thanks, r~
Re: [PATCH RFC] target/arm: Implement SVE2 gather load insns
On 4/22/20 8:23 AM, Stephen Long wrote: > Add decoding logic for SVE2 64-bit/32-bit gather non-temporal load > insns. > > 64-bit > * LDNT1SB > * LDNT1B (vector plus scalar) > * LDNT1SH > * LDNT1H (vector plus scalar) > * LDNT1SW > * LDNT1W (vector plus scalar) > * LDNT1D (vector plus scalar) > > 32-bit > * LDNT1SB > * LDNT1B (vector plus scalar) > * LDNT1SH > * LDNT1H (vector plus scalar) > * LDNT1W (vector plus scalar) > > Signed-off-by: Stephen Long > > I'm not sure I'm initializing xs correctly. This also goes for the > scatter store insns in the previous patch. You did. xs=0 is 32-bit unsigned offset, xs=1 is 32-bit signed offset (directly from the SVE encoding); I repurpose xs=2 as 64-bit offset. There's a comment in there next to the load/store helper array to that effect. > --- > target/arm/sve.decode | 11 +++ > target/arm/translate-sve.c | 8 > 2 files changed, 19 insertions(+) Applied to my SVE2 branch. Thanks! r~
Re: [RFC PATCH] translate-all: include guest address in out_asm output
Patchew URL: https://patchew.org/QEMU/20200424173914.2957-1-alex.ben...@linaro.org/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === CC aarch64-softmmu/hw/arm/pxa2xx_pic.o CC aarch64-softmmu/hw/arm/digic.o /tmp/qemu-test/src/accel/tcg/translate-all.c: In function 'tb_gen_code': /tmp/qemu-test/src/accel/tcg/translate-all.c:1806:43: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'long long unsigned int'} [-Werror=format=] qemu_log(" prologue: [size=%ld]\n", insn_start); ~~^ ~~ %lld /tmp/qemu-test/src/accel/tcg/translate-all.c:1827:39: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'long long unsigned int'} [-Werror=format=] qemu_log(" data: [size=%ld]\n", data_size); ~~^ ~ %lld cc1: all warnings being treated as errors make[1]: *** [/tmp/qemu-test/src/rules.mak:69: accel/tcg/translate-all.o] Error 1 make[1]: *** Waiting for unfinished jobs CC aarch64-softmmu/hw/arm/omap1.o CC x86_64-softmmu/gdbstub-xml.o /tmp/qemu-test/src/accel/tcg/translate-all.c: In function 'tb_gen_code': /tmp/qemu-test/src/accel/tcg/translate-all.c:1806:43: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'long long unsigned int'} [-Werror=format=] qemu_log(" prologue: [size=%ld]\n", insn_start); ~~^ ~~ %lld /tmp/qemu-test/src/accel/tcg/translate-all.c:1827:39: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'long long unsigned int'} [-Werror=format=] qemu_log(" data: [size=%ld]\n", data_size); ~~^ ~ %lld cc1: all warnings being treated as errors make[1]: *** [/tmp/qemu-test/src/rules.mak:69: accel/tcg/translate-all.o] Error 1 make[1]: *** Waiting for unfinished jobs make: *** [Makefile:527: x86_64-softmmu/all] Error 2 make: *** Waiting for unfinished jobs make: *** [Makefile:527: aarch64-softmmu/all] Error 2 Traceback (most recent call last): File "./tests/docker/docker.py", line 664, in sys.exit(main()) --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=710671f8720441949df423784981c162', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-frsoowsh/src/docker-src.2020-04-24-16.07.10.20709:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=710671f8720441949df423784981c162 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-frsoowsh/src' make: *** [docker-run-test-mingw@fedora] Error 2 real2m52.582s user0m8.929s The full log is available at http://patchew.org/logs/20200424173914.2957-1-alex.ben...@linaro.org/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH 06/11] error: Use error_reportf_err() where appropriate
On 4/24/20 2:20 PM, Markus Armbruster wrote: Replace error_report("...: %s", ..., error_get_pretty(err)); by error_reportf_err(err, "...: ", ...); Reviewed-by: Eric Blake Signed-off-by: Markus Armbruster --- chardev/char-socket.c | 5 +++-- hw/sd/pxa2xx_mmci.c | 4 ++-- hw/sd/sd.c| 4 ++-- hw/usb/dev-mtp.c | 9 + qemu-nbd.c| 7 +++ scsi/qemu-pr-helper.c | 4 ++-- 6 files changed, 17 insertions(+), 16 deletions(-) Although it touches NBD, I'm happy for this to go through your tree with the larger series. +++ b/qemu-nbd.c @@ -856,8 +856,7 @@ int main(int argc, char **argv) } tlscreds = nbd_get_tls_creds(tlscredsid, list, &local_err); if (local_err) { -error_report("Failed to get TLS creds %s", - error_get_pretty(local_err)); +error_reportf_err(local_err, "Failed to get TLS creds "); Odd one out for not using ':' in the message, but that's independent of this patch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v2] target/arm: Implement SVE2 scatter store insns
On 4/22/20 7:15 AM, Stephen Long wrote: > Add decoding logic for SVE2 64-bit/32-bit scatter non-temporal store > insns. > > 64-bit > * STNT1B (vector plus scalar) > * STNT1H (vector plus scalar) > * STNT1W (vector plus scalar) > * STNT1D (vector plus scalar) > > 32-bit > * STNT1B (vector plus scalar) > * STNT1H (vector plus scalar) > * STNT1W (vector plus scalar) > > Signed-off-by: Stephen Long > > Cool, it seemed to typedef correctly. > --- > target/arm/sve.decode | 10 ++ > target/arm/translate-sve.c | 8 > 2 files changed, 18 insertions(+) Applied to my SVE2 branch. Thanks! r~
Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type
On Fri, Apr 24, 2020 at 12:12 PM Corey Wharton wrote: > > > > > -Original Message- > > From: Alistair Francis > > Sent: Friday, April 24, 2020 9:04 AM > > To: Corey Wharton > > Cc: qemu-devel@nongnu.org Developers ; open > > list:RISC-V ; Sagar Karandikar > > ; Bastian Koppelmann > paderborn.de>; Alistair Francis ; Palmer Dabbelt > > ; Bin Meng > > Subject: Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type > > > > On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton wrote: > > > > > > Allows the CPU to be changed from the default via the -cpu command > > > line option. > > > > > > Signed-off-by: Corey Wharton > > > Reviewed-by: Bin Meng > > > Reviewed-by: Alistair Francis > > > > Thanks for the patch. > > > > Unfortunately this fails `make check`. > > > > The problem is that the machine has the `default_cpu_type` set but then you > > set "cpu-type" from the SoC. > > > > This diff fixes the make check failure for me: > > > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index > > 1fd08f325c..b53109521e 100644 > > --- a/hw/riscv/sifive_e.c > > +++ b/hw/riscv/sifive_e.c > > @@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj) > > object_initialize_child(obj, "cpus", &s->cpus, > > sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, > > &error_abort, NULL); > > -object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", > > -&error_abort); > > object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", > > &error_abort); > > sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0", @@ -141,6 +139,8 @@ > > static void riscv_sifive_e_soc_realize(DeviceState > > *dev, Error **errp) > > SiFiveESoCState *s = RISCV_E_SOC(dev); > > MemoryRegion *sys_mem = get_system_memory(); > > > > +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", > > +&error_abort); > > object_property_set_bool(OBJECT(&s->cpus), true, "realized", > > &error_abort); > > > > > > I'm happy to just squash that into the patch. Let me know if you don't want > > me to do that and I'll drop these patches and let you re-send them. > > > > Alistair > > > > Thanks for fixing this issue. I tested your patch and it seems to work as > Intended and I'm fine with you squashing it into the patch. Great! I'll send this patch as part of the PR after 5.0 then. I also realised that my justification was wrong. It's not because of the machine/SoC split, but because of the order between init/realise. Alistair > > Corey > > > > --- > > > hw/riscv/sifive_e.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index > > > a254cad489..b0a611adb9 100644 > > > --- a/hw/riscv/sifive_e.c > > > +++ b/hw/riscv/sifive_e.c > > > @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj) > > > object_initialize_child(obj, "cpus", &s->cpus, > > > sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, > > > &error_abort, NULL); > > > -object_property_set_str(OBJECT(&s->cpus), SIFIVE_E_CPU, "cpu-type", > > > +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, > > > + "cpu-type", > > > &error_abort); > > > object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num- > > harts", > > > &error_abort); @@ -220,6 +220,7 @@ static > > > void riscv_sifive_e_machine_init(MachineClass *mc) > > > mc->desc = "RISC-V Board compatible with SiFive E SDK"; > > > mc->init = riscv_sifive_e_init; > > > mc->max_cpus = 1; > > > +mc->default_cpu_type = SIFIVE_E_CPU; > > > } > > > > > > DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init) > > > -- > > > 2.21.1 > > > > > >
Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions
On 4/21/20 9:39 PM, Richard Henderson wrote: > On 4/20/20 3:29 AM, Szabolcs Nagy wrote: >> i'm using the branch at >> >> https://github.com/rth7680/qemu/tree/tgt-arm-mte >> >> to test armv8.5-a mte and hope this is ok to report bugs here. >> >> i'm doing tests in qemu-system-aarch64 with linux userspace >> code and it seems TCO bit gets cleared after syscalls or other >> kernel entry, but PSTATE is expected to be restored, so i >> suspect it is a qemu bug. >> >> i think the architecture saves/restores PSTATE using SPSR_ELx >> on exceptions. > > Yep. I failed to update aarch64_pstate_valid_mask for TCO. > Will fix. Thanks, Fixed on the branch. I still need to work out how best to plumb the arm,armv8.5-memtag property so the devel/mte-v3 kernel branch isn't usable as-is for the moment. For myself, I've just commented that test out for now. r~
Re: [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
On 3/17/20 1:16 PM, Alberto Garcia wrote: The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz --- block/qcow2-cluster.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index ceacd91ea3..dfd8b66958 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1006,6 +1006,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) assert((offset & L2E_OFFSET_MASK) == offset); set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED); + +/* Update bitmap with the subclusters that were just written */ +if (has_subclusters(s)) { +unsigned written_from = m->cow_start.offset; +unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?: +m->nb_clusters << s->cluster_bits; +uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i); +int sc; +for (sc = 0; sc < s->subclusters_per_cluster; sc++) { +int sc_off = i * s->cluster_size + sc * s->subcluster_size; +if (sc_off >= written_from && sc_off < written_to) { +l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc); +l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc); +} +} Are there more efficient ways to set this series of bits than iterating one bit at a time, while still remaining legible? For example, what if we had something like: l2_bitmap = get_l2_bitmap(...); int sc_from = OFFSET_TO_SC(written_from); int sc_to = OFFSET_TO_SC(written_to - 1); l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(sc_from, sc_to); l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(sc_from, sc_to); which would require macros: #define OFFSET_TO_SC(offset) (offset >> (s->cluster_bits - 6)) #define QCOW_OFLAG_SUB_ALLOC_RANGE(from, to) \ deposit64(0, (from), (len) - (from), -1) #define QCOW_OFLAG_SUB_ZERO_RANGE(from, to) \ deposit64(0, (from) + 32, (len) - (from) + 32, -1) +set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap); I'm hoping this function doesn't cause redundant I/O if the L2 entry didn't actually change. But that's not the concern for this patch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [RFC PATCH] translate-all: include guest address in out_asm output
On 4/24/20 10:39 AM, Alex Bennée wrote: > +/* first dump prologue */ > +insn_start = tcg_ctx->gen_insn_end_off[0]; > +if (insn_start > 0) { > +qemu_log(" prologue: [size=%ld]\n", insn_start); > +log_disas(tb->tc.ptr, insn_start); > +} > + > +do { > +size_t insn_end; > +if (insn < (tb->icount - 1)) { > +insn_end = tcg_ctx->gen_insn_end_off[insn + 1]; > +} else { > +insn_end = code_size; > +} > +qemu_log(" for guest addr: " TARGET_FMT_lx ":\n", > + tcg_ctx->gen_insn_data[insn][0]); The one thing you're missing here is when a given guest insn emits no host insns. E.g. an actual guest nop, or if two guest insns are optimized together. So you need to search forward through empty insns til you find one that has contents. E.g. the very first TB that alpha-softmmu executes in its bios: OP after optimization and liveness analysis: ld_i32 tmp0,env,$0xfff0 dead: 1 pref=0x movi_i32 tmp1,$0x0 pref=0x brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1 fc00 fc04 fc08 movi_i64 gp,$0xfc012f50 sync: 0 pref=0x OUT: [size=280] prologue: [size=11] 0x7fffa100: 8b 5d f0 movl -0x10(%rbp), %ebx 0x7fffa103: 85 dbtestl%ebx, %ebx 0x7fffa105: 0f 8c d6 00 00 00jl 0x7fffa1e1 for guest addr: fc00: for guest addr: fc04: 0x7fffa10b: 48 bb 50 2f 01 00 00 fc movabsq $0xfc012f50, %rbx 0x7fffa113: ff ff 0x7fffa115: 48 89 9d e8 00 00 00 movq %rbx, 0xe8(%rbp) for guest addr: fc08: So you've attributed to ...04 what actually belongs to ...08. But it's a good idea. r~
[PATCH 02/11] xen: Fix and improve handling of device_add usb-host errors
usbback_portid_add() leaks the error when qdev_device_add() fails. Fix that. While there, use the error to improve the error message. The qemu_opts_from_qdict() similarly leaks on failure. But any failure there is a programming error. Pass &error_abort. Fixes: 816ac92ef769f9ffc534e49a1bb6177bddce7aa2 Cc: Stefano Stabellini Cc: Anthony Perard Cc: Paul Durrant Cc: Gerd Hoffmann Cc: xen-de...@lists.xenproject.org Signed-off-by: Markus Armbruster --- hw/usb/xen-usb.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c index 961190d0f7..42643c3390 100644 --- a/hw/usb/xen-usb.c +++ b/hw/usb/xen-usb.c @@ -30,6 +30,7 @@ #include "hw/usb.h" #include "hw/xen/xen-legacy-backend.h" #include "monitor/qdev.h" +#include "qapi/error.h" #include "qapi/qmp/qdict.h" #include "qapi/qmp/qstring.h" @@ -755,13 +756,15 @@ static void usbback_portid_add(struct usbback_info *usbif, unsigned port, qdict_put_int(qdict, "port", port); qdict_put_int(qdict, "hostbus", atoi(busid)); qdict_put_str(qdict, "hostport", portname); -opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, &local_err); -if (local_err) { -goto err; -} +opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, +&error_abort); usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, &local_err)); if (!usbif->ports[port - 1].dev) { -goto err; +qobject_unref(qdict); +xen_pv_printf(&usbif->xendev, 0, + "device %s could not be opened: %s\n", + busid, error_get_pretty(local_err)); +error_free(local_err); } qobject_unref(qdict); speed = usbif->ports[port - 1].dev->speed; @@ -793,11 +796,6 @@ static void usbback_portid_add(struct usbback_info *usbif, unsigned port, usbback_hotplug_enq(usbif, port); TR_BUS(&usbif->xendev, "port %d attached\n", port); -return; - -err: -qobject_unref(qdict); -xen_pv_printf(&usbif->xendev, 0, "device %s could not be opened\n", busid); } static void usbback_process_port(struct usbback_info *usbif, unsigned port) -- 2.21.1
[PATCH 10/11] arm/sabrelite: Consistently use &error_fatal in sabrelite_init()
Cc: Peter Maydell Cc: Jean-Christophe Dubois Signed-off-by: Markus Armbruster --- hw/arm/sabrelite.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c index e31694bb92..04f4b96591 100644 --- a/hw/arm/sabrelite.c +++ b/hw/arm/sabrelite.c @@ -41,7 +41,6 @@ static void sabrelite_reset_secondary(ARMCPU *cpu, static void sabrelite_init(MachineState *machine) { FslIMX6State *s; -Error *err = NULL; /* Check the amount of memory is compatible with the SOC */ if (machine->ram_size > FSL_IMX6_MMDC_SIZE) { @@ -52,11 +51,7 @@ static void sabrelite_init(MachineState *machine) s = FSL_IMX6(object_new(TYPE_FSL_IMX6)); object_property_add_child(OBJECT(machine), "soc", OBJECT(s), &error_fatal); -object_property_set_bool(OBJECT(s), true, "realized", &err); -if (err != NULL) { -error_report("%s", error_get_pretty(err)); -exit(1); -} +object_property_set_bool(OBJECT(s), true, "realized", &error_fatal); memory_region_add_subregion(get_system_memory(), FSL_IMX6_MMDC_ADDR, machine->ram); -- 2.21.1
[PATCH 09/11] mips/boston: Plug memory leak in boston_mach_init()
Fixes: df1d8a1f29f567567b9d20be685a4241282e7005 Cc: Paul Burton Cc: Aleksandar Rikalo Signed-off-by: Markus Armbruster --- hw/mips/boston.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/mips/boston.c b/hw/mips/boston.c index 2832dfa6ae..a896056be1 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -426,7 +426,6 @@ static void boston_mach_init(MachineState *machine) { DeviceState *dev; BostonState *s; -Error *err = NULL; MemoryRegion *flash, *ddr_low_alias, *lcd, *platreg; MemoryRegion *sys_mem = get_system_memory(); XilinxPCIEHost *pcie2; @@ -467,7 +466,8 @@ static void boston_mach_init(MachineState *machine) sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->cps), 0, 0, 1); flash = g_new(MemoryRegion, 1); -memory_region_init_rom(flash, NULL, "boston.flash", 128 * MiB, &err); +memory_region_init_rom(flash, NULL, "boston.flash", 128 * MiB, + &error_fatal); memory_region_add_subregion_overlap(sys_mem, 0x1800, flash, 0); memory_region_add_subregion_overlap(sys_mem, 0x8000, machine->ram, 0); -- 2.21.1
[PATCH 03/11] s390x/cpumodel: Fix harmless misuse of visit_check_struct()
Commit e47970f51d "s390x/cpumodel: Fix query-cpu-model-FOO error API violations" neglected to change visit_end_struct()'s Error ** argument along with the others. If visit_end_struct() failed, we'd take the success path. Fortunately, it can't fail here: qobject_input_check_struct() checks we consumed the whole dictionary, and to get here, we did. Fix it anyway. Cc: David Hildenbrand Cc: Cornelia Huck Signed-off-by: Markus Armbruster --- target/s390x/cpu_models.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c index 7c32180269..87a8028ad3 100644 --- a/target/s390x/cpu_models.c +++ b/target/s390x/cpu_models.c @@ -524,7 +524,7 @@ static void cpu_model_from_info(S390CPUModel *model, const CpuModelInfo *info, } } if (!err) { -visit_check_struct(visitor, errp); +visit_check_struct(visitor, &err); } visit_end_struct(visitor, NULL); visit_free(visitor); -- 2.21.1
[PATCH 08/11] mips/boston: Fix boston_mach_init() error handling
The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second boston_mach_init() is wrong that way. The last calls treats an error as fatal. Do that for the prior ones, too. Fixes: df1d8a1f29f567567b9d20be685a4241282e7005 Cc: Paul Burton Cc: Aleksandar Rikalo Signed-off-by: Markus Armbruster --- hw/mips/boston.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/hw/mips/boston.c b/hw/mips/boston.c index 98ecd25e8e..2832dfa6ae 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -458,14 +458,11 @@ static void boston_mach_init(MachineState *machine) sysbus_init_child_obj(OBJECT(machine), "cps", OBJECT(&s->cps), sizeof(s->cps), TYPE_MIPS_CPS); object_property_set_str(OBJECT(&s->cps), machine->cpu_type, "cpu-type", -&err); -object_property_set_int(OBJECT(&s->cps), machine->smp.cpus, "num-vp", &err); -object_property_set_bool(OBJECT(&s->cps), true, "realized", &err); - -if (err != NULL) { -error_report("%s", error_get_pretty(err)); -exit(1); -} +&error_fatal); +object_property_set_int(OBJECT(&s->cps), machine->smp.cpus, "num-vp", +&error_fatal); +object_property_set_bool(OBJECT(&s->cps), true, "realized", + &error_fatal); sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->cps), 0, 0, 1); -- 2.21.1
[PATCH 06/11] error: Use error_reportf_err() where appropriate
Replace error_report("...: %s", ..., error_get_pretty(err)); by error_reportf_err(err, "...: ", ...); Signed-off-by: Markus Armbruster --- chardev/char-socket.c | 5 +++-- hw/sd/pxa2xx_mmci.c | 4 ++-- hw/sd/sd.c| 4 ++-- hw/usb/dev-mtp.c | 9 + qemu-nbd.c| 7 +++ scsi/qemu-pr-helper.c | 4 ++-- 6 files changed, 17 insertions(+), 16 deletions(-) diff --git a/chardev/char-socket.c b/chardev/char-socket.c index 185fe38dda..e5ee685f8c 100644 --- a/chardev/char-socket.c +++ b/chardev/char-socket.c @@ -138,8 +138,9 @@ static void check_report_connect_error(Chardev *chr, SocketChardev *s = SOCKET_CHARDEV(chr); if (!s->connect_err_reported) { -error_report("Unable to connect character device %s: %s", - chr->label, error_get_pretty(err)); +error_reportf_err(err, + "Unable to connect character device %s: ", + chr->label); s->connect_err_reported = true; } qemu_chr_socket_restart_timer(chr); diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c index 8f9ab0ec16..f9c50ddda5 100644 --- a/hw/sd/pxa2xx_mmci.c +++ b/hw/sd/pxa2xx_mmci.c @@ -497,12 +497,12 @@ PXA2xxMMCIState *pxa2xx_mmci_init(MemoryRegion *sysmem, carddev = qdev_create(qdev_get_child_bus(dev, "sd-bus"), TYPE_SD_CARD); qdev_prop_set_drive(carddev, "drive", blk, &err); if (err) { -error_report("failed to init SD card: %s", error_get_pretty(err)); +error_reportf_err(err, "failed to init SD card: "); return NULL; } object_property_set_bool(OBJECT(carddev), true, "realized", &err); if (err) { -error_report("failed to init SD card: %s", error_get_pretty(err)); +error_reportf_err(err, "failed to init SD card: "); return NULL; } diff --git a/hw/sd/sd.c b/hw/sd/sd.c index 71a9af09ab..3c06a0ac6d 100644 --- a/hw/sd/sd.c +++ b/hw/sd/sd.c @@ -703,13 +703,13 @@ SDState *sd_init(BlockBackend *blk, bool is_spi) dev = DEVICE(obj); qdev_prop_set_drive(dev, "drive", blk, &err); if (err) { -error_report("sd_init failed: %s", error_get_pretty(err)); +error_reportf_err(err, "sd_init failed: "); return NULL; } qdev_prop_set_bit(dev, "spi", is_spi); object_property_set_bool(obj, true, "realized", &err); if (err) { -error_report("sd_init failed: %s", error_get_pretty(err)); +error_reportf_err(err, "sd_init failed: "); return NULL; } diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c index 20717f026b..168428156b 100644 --- a/hw/usb/dev-mtp.c +++ b/hw/usb/dev-mtp.c @@ -631,8 +631,9 @@ static void usb_mtp_object_readdir(MTPState *s, MTPObject *o) int64_t id = qemu_file_monitor_add_watch(s->file_monitor, o->path, NULL, file_monitor_event, s, &err); if (id == -1) { -error_report("usb-mtp: failed to add watch for %s: %s", o->path, - error_get_pretty(err)); +error_reportf_err(err, + "usb-mtp: failed to add watch for %s: ", + o->path); error_free(err); } else { trace_usb_mtp_file_monitor_event(s->dev.addr, o->path, @@ -1276,8 +1277,8 @@ static void usb_mtp_command(MTPState *s, MTPControl *c) s->file_monitor = qemu_file_monitor_new(&err); if (err) { -error_report("usb-mtp: file monitoring init failed: %s", - error_get_pretty(err)); +error_reportf_err(err, + "usb-mtp: file monitoring init failed: "); error_free(err); } else { QTAILQ_INIT(&s->events); diff --git a/qemu-nbd.c b/qemu-nbd.c index 4aa005004e..30deb5d9e6 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -856,8 +856,7 @@ int main(int argc, char **argv) } tlscreds = nbd_get_tls_creds(tlscredsid, list, &local_err); if (local_err) { -error_report("Failed to get TLS creds %s", - error_get_pretty(local_err)); +error_reportf_err(local_err, "Failed to get TLS creds "); exit(EXIT_FAILURE); } } else { @@ -979,8 +978,8 @@ int main(int argc, char **argv) &local_err); if (sioc == NULL) { object_unref(OBJECT(server)); -error_report("Failed to use socket activation: %s", - error_get_pretty(local_err)); +error_reportf_err(local_err, + "Failed to use socket activation: "); exit(EXIT_FAILURE); } qio_net_listener_add(server, sioc); diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c index 181ed4a186..57ad830d54 100644 --- a/scsi/qemu-pr-helper.c +++ b/scsi/
[Bug 1874674] Re: [Feature request] acceptance test class to run user-mode binaries
What user-mode testing do you think might be improved by using avocado? IMO at present we have a fairly comprehensive testing infrastructure for user-mode that is simply underused. With docker, we have a set of cross-compilers for most guest architectures, and we are able to build statically linked binaries that are copied out of the container for testing by the just-built qemu binaries on the host. This infrastructure is used by check-tcg. It's fairly easy to add new test cases to be run on one or all guests. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1874674 Title: [Feature request] acceptance test class to run user-mode binaries Status in QEMU: New Bug description: Currently the acceptance test framework only target system-mode emulation. It would be useful to test user-mode too. Ref: https://www.mail-archive.com/qemu-devel@nongnu.org/msg626610.html To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1874674/+subscriptions
[PATCH 07/11] mips/malta: Fix create_cps() error handling
The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second create_cps() is wrong that way. The last calls treats an error as fatal. Do that for the prior ones, too. Fixes: bff384a4fbd5d0e86939092e74e766ef0f5f592c Cc: Aleksandar Markovic Cc: "Philippe Mathieu-Daudé" Cc: Aurelien Jarno Signed-off-by: Markus Armbruster --- hw/mips/mips_malta.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index e4c4de1b4e..17bf41616b 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1185,17 +1185,14 @@ static void create_cpu_without_cps(MachineState *ms, static void create_cps(MachineState *ms, MaltaState *s, qemu_irq *cbus_irq, qemu_irq *i8259_irq) { -Error *err = NULL; - sysbus_init_child_obj(OBJECT(s), "cps", OBJECT(&s->cps), sizeof(s->cps), TYPE_MIPS_CPS); -object_property_set_str(OBJECT(&s->cps), ms->cpu_type, "cpu-type", &err); -object_property_set_int(OBJECT(&s->cps), ms->smp.cpus, "num-vp", &err); -object_property_set_bool(OBJECT(&s->cps), true, "realized", &err); -if (err != NULL) { -error_report("%s", error_get_pretty(err)); -exit(1); -} +object_property_set_str(OBJECT(&s->cps), ms->cpu_type, "cpu-type", +&error_fatal); +object_property_set_int(OBJECT(&s->cps), ms->smp.cpus, "num-vp", +&error_fatal); +object_property_set_bool(OBJECT(&s->cps), true, "realized", + &error_fatal); sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->cps), 0, 0, 1); -- 2.21.1
[PATCH 05/11] tests/migration: Tighten error checking
migrate_get_socket_address() neglects to check visit_type_SocketAddressList() failure. This smells like a leak, but it actually will crash dereferencing @addrs. Pass &error_abort to remove the code smell. Signed-off-by: Markus Armbruster --- tests/qtest/migration-test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 2568c9529c..dc3490c9fa 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -13,6 +13,7 @@ #include "qemu/osdep.h" #include "libqtest.h" +#include "qapi/error.h" #include "qapi/qmp/qdict.h" #include "qemu/module.h" #include "qemu/option.h" @@ -301,7 +302,6 @@ static char *migrate_get_socket_address(QTestState *who, const char *parameter) { QDict *rsp; char *result; -Error *local_err = NULL; SocketAddressList *addrs; Visitor *iv = NULL; QObject *object; @@ -310,7 +310,7 @@ static char *migrate_get_socket_address(QTestState *who, const char *parameter) object = qdict_get(rsp, parameter); iv = qobject_input_visitor_new(object); -visit_type_SocketAddressList(iv, NULL, &addrs, &local_err); +visit_type_SocketAddressList(iv, NULL, &addrs, &error_abort); visit_free(iv); /* we are only using a single address */ -- 2.21.1
[PATCH 01/11] nvdimm: Plug memory leak in uuid property setter
nvdimm_set_uuid() leaks memory on qemu_uuid_parse() failure. Fix that. Fixes: 6c5627bb24dcd68c997857a8b671617333b1289f Cc: Xiao Guangrong Cc: Shivaprasad G Bhat Signed-off-by: Markus Armbruster --- hw/mem/nvdimm.c | 1 - 1 file changed, 1 deletion(-) diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 8e426d24bb..d5752f7bf6 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -97,7 +97,6 @@ static void nvdimm_set_uuid(Object *obj, Visitor *v, const char *name, if (qemu_uuid_parse(value, &nvdimm->uuid) != 0) { error_setg(errp, "Property '%s.%s' has invalid value", object_get_typename(obj), name); -goto out; } g_free(value); -- 2.21.1
[PATCH 00/11] More miscellaneous error handling fixes
Markus Armbruster (11): nvdimm: Plug memory leak in uuid property setter xen: Fix and improve handling of device_add usb-host errors s390x/cpumodel: Fix harmless misuse of visit_check_struct() s390x/pci: Fix harmless mistake in zpci's property fid's setter tests/migration: Tighten error checking error: Use error_reportf_err() where appropriate mips/malta: Fix create_cps() error handling mips/boston: Fix boston_mach_init() error handling mips/boston: Plug memory leak in boston_mach_init() arm/sabrelite: Consistently use &error_fatal in sabrelite_init() i386: Fix x86_cpu_load_model() error API violation chardev/char-socket.c| 5 +++-- hw/arm/sabrelite.c | 7 +-- hw/mem/nvdimm.c | 1 - hw/mips/boston.c | 17 +++-- hw/mips/mips_malta.c | 15 ++- hw/s390x/s390-pci-bus.c | 4 +++- hw/sd/pxa2xx_mmci.c | 4 ++-- hw/sd/sd.c | 4 ++-- hw/usb/dev-mtp.c | 9 + hw/usb/xen-usb.c | 18 -- qemu-nbd.c | 7 +++ scsi/qemu-pr-helper.c| 4 ++-- target/i386/cpu.c| 25 - target/s390x/cpu_models.c| 2 +- tests/qtest/migration-test.c | 4 ++-- 15 files changed, 61 insertions(+), 65 deletions(-) -- 2.21.1
[PATCH 04/11] s390x/pci: Fix harmless mistake in zpci's property fid's setter
s390_pci_set_fid() sets zpci->fid_defined to true even when visit_type_uint32() failed. Reproducer: "-device zpci,fid=junk". Harmless in practice, because qdev_device_add() then fails, throwing away @zpci. Fix it anyway. Cc: Matthew Rosato Cc: Cornelia Huck Signed-off-by: Markus Armbruster --- hw/s390x/s390-pci-bus.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c index ed8be124da..19ee1f02bb 100644 --- a/hw/s390x/s390-pci-bus.c +++ b/hw/s390x/s390-pci-bus.c @@ -1276,7 +1276,9 @@ static void s390_pci_set_fid(Object *obj, Visitor *v, const char *name, return; } -visit_type_uint32(v, name, ptr, errp); +if (!visit_type_uint32(v, name, ptr, errp)) { +return; +} zpci->fid_defined = true; } -- 2.21.1
[PATCH 11/11] i386: Fix x86_cpu_load_model() error API violation
The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. x86_cpu_load_model() is wrong that way. Harmless, because its @errp is always &error_abort. To fix, cut out the @errp middleman. Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Markus Armbruster --- target/i386/cpu.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 90ffc5f3b1..3f09fd2321 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5078,7 +5078,7 @@ static void x86_cpu_apply_version_props(X86CPU *cpu, X86CPUModel *model) /* Load data from X86CPUDefinition into a X86CPU object */ -static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model, Error **errp) +static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model) { X86CPUDefinition *def = model->cpudef; CPUX86State *env = &cpu->env; @@ -5092,13 +5092,19 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model, Error **errp) */ /* CPU models only set _minimum_ values for level/xlevel: */ -object_property_set_uint(OBJECT(cpu), def->level, "min-level", errp); -object_property_set_uint(OBJECT(cpu), def->xlevel, "min-xlevel", errp); +object_property_set_uint(OBJECT(cpu), def->level, "min-level", + &error_abort); +object_property_set_uint(OBJECT(cpu), def->xlevel, "min-xlevel", + &error_abort); -object_property_set_int(OBJECT(cpu), def->family, "family", errp); -object_property_set_int(OBJECT(cpu), def->model, "model", errp); -object_property_set_int(OBJECT(cpu), def->stepping, "stepping", errp); -object_property_set_str(OBJECT(cpu), def->model_id, "model-id", errp); +object_property_set_int(OBJECT(cpu), def->family, "family", +&error_abort); +object_property_set_int(OBJECT(cpu), def->model, "model", +&error_abort); +object_property_set_int(OBJECT(cpu), def->stepping, "stepping", +&error_abort); +object_property_set_str(OBJECT(cpu), def->model_id, "model-id", +&error_abort); for (w = 0; w < FEATURE_WORDS; w++) { env->features[w] = def->features[w]; } @@ -5135,7 +5141,8 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model, Error **errp) vendor = host_vendor; } -object_property_set_str(OBJECT(cpu), vendor, "vendor", errp); +object_property_set_str(OBJECT(cpu), vendor, "vendor", +&error_abort); x86_cpu_apply_version_props(cpu, model); } @@ -6981,7 +6988,7 @@ static void x86_cpu_initfn(Object *obj) object_property_add_alias(obj, "sse4_2", obj, "sse4.2", &error_abort); if (xcc->model) { -x86_cpu_load_model(cpu, xcc->model, &error_abort); +x86_cpu_load_model(cpu, xcc->model); } } -- 2.21.1
RE: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type
> -Original Message- > From: Alistair Francis > Sent: Friday, April 24, 2020 9:04 AM > To: Corey Wharton > Cc: qemu-devel@nongnu.org Developers ; open > list:RISC-V ; Sagar Karandikar > ; Bastian Koppelmann paderborn.de>; Alistair Francis ; Palmer Dabbelt > ; Bin Meng > Subject: Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type > > On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton wrote: > > > > Allows the CPU to be changed from the default via the -cpu command > > line option. > > > > Signed-off-by: Corey Wharton > > Reviewed-by: Bin Meng > > Reviewed-by: Alistair Francis > > Thanks for the patch. > > Unfortunately this fails `make check`. > > The problem is that the machine has the `default_cpu_type` set but then you > set "cpu-type" from the SoC. > > This diff fixes the make check failure for me: > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index > 1fd08f325c..b53109521e 100644 > --- a/hw/riscv/sifive_e.c > +++ b/hw/riscv/sifive_e.c > @@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj) > object_initialize_child(obj, "cpus", &s->cpus, > sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, > &error_abort, NULL); > -object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", > -&error_abort); > object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", > &error_abort); > sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0", @@ -141,6 +139,8 @@ > static void riscv_sifive_e_soc_realize(DeviceState > *dev, Error **errp) > SiFiveESoCState *s = RISCV_E_SOC(dev); > MemoryRegion *sys_mem = get_system_memory(); > > +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", > +&error_abort); > object_property_set_bool(OBJECT(&s->cpus), true, "realized", > &error_abort); > > > I'm happy to just squash that into the patch. Let me know if you don't want > me to do that and I'll drop these patches and let you re-send them. > > Alistair > Thanks for fixing this issue. I tested your patch and it seems to work as Intended and I'm fine with you squashing it into the patch. Corey > > --- > > hw/riscv/sifive_e.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index > > a254cad489..b0a611adb9 100644 > > --- a/hw/riscv/sifive_e.c > > +++ b/hw/riscv/sifive_e.c > > @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj) > > object_initialize_child(obj, "cpus", &s->cpus, > > sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, > > &error_abort, NULL); > > -object_property_set_str(OBJECT(&s->cpus), SIFIVE_E_CPU, "cpu-type", > > +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, > > + "cpu-type", > > &error_abort); > > object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num- > harts", > > &error_abort); @@ -220,6 +220,7 @@ static > > void riscv_sifive_e_machine_init(MachineClass *mc) > > mc->desc = "RISC-V Board compatible with SiFive E SDK"; > > mc->init = riscv_sifive_e_init; > > mc->max_cpus = 1; > > +mc->default_cpu_type = SIFIVE_E_CPU; > > } > > > > DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init) > > -- > > 2.21.1 > > > >
Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
* Yan Zhao (yan.y.z...@intel.com) wrote: > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote: > > > From: Yan Zhao > > > Sent: Tuesday, April 21, 2020 10:37 AM > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote: > > > > On Sun, 19 Apr 2020 21:24:57 -0400 > > > > Yan Zhao wrote: > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote: > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400 > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote: > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400 > > > > > > > > Yan Zhao wrote: > > > > > > > > > > > > > > > > > This patchset introduces a migration_version attribute under > > > > > > > > > sysfs > > > of VFIO > > > > > > > > > Mediated devices. > > > > > > > > > > > > > > > > > > This migration_version attribute is used to check migration > > > compatibility > > > > > > > > > between two mdev devices. > > > > > > > > > > > > > > > > > > Currently, it has two locations: > > > > > > > > > (1) under mdev_type node, > > > > > > > > > which can be used even before device creation, but only > > > > > > > > > for > > > mdev > > > > > > > > > devices of the same mdev type. > > > > > > > > > (2) under mdev device node, > > > > > > > > > which can only be used after the mdev devices are > > > > > > > > > created, but > > > the src > > > > > > > > > and target mdev devices are not necessarily be of the same > > > mdev type > > > > > > > > > (The second location is newly added in v5, in order to keep > > > consistent > > > > > > > > > with the migration_version node for migratable pass-though > > > devices) > > > > > > > > > > > > > > > > What is the relationship between those two attributes? > > > > > > > > > > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep > > > > > > > the > > > same > > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev > > > devices and > > > > > > > non-mdev devices. > > > > > > > > > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev > > > > > > > device > > > > > > > is binding to vfio-pci, but is able to register migration region > > > > > > > and do > > > > > > > migration transactions from a vendor provided affiliate driver), > > > > > > > the vendor driver would export (2) directly, under device node. > > > > > > > It is not able to provide (1) as there're no mdev devices > > > > > > > involved. > > > > > > > > > > > > Ok, creating an alternate attribute for non-mdev devices makes > > > > > > sense. > > > > > > However, wouldn't that rather be a case (3)? The change here only > > > > > > refers to mdev devices. > > > > > > > > > > > as you pointed below, (3) and (2) serve the same purpose. > > > > > and I think a possible usage is to migrate between a non-mdev device > > > > > and > > > > > an mdev device. so I think it's better for them both to use (2) rather > > > > > than creating (3). > > > > > > > > An mdev type is meant to define a software compatible interface, so in > > > > the case of mdev->mdev migration, doesn't migrating to a different type > > > > fail the most basic of compatibility tests that we expect userspace to > > > > perform? IOW, if two mdev types are migration compatible, it seems a > > > > prerequisite to that is that they provide the same software interface, > > > > which means they should be the same mdev type. > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a > > > management > > > > tool begin to even guess what might be compatible? Are we expecting > > > > libvirt to probe ever device with this attribute in the system? Is > > > > there going to be a new class hierarchy created to enumerate all > > > > possible migrate-able devices? > > > > > > > yes, management tool needs to guess and test migration compatible > > > between two devices. But I think it's not the problem only for > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs > > > to > > > first assume that the two mdevs have the same type of parent devices > > > (e.g.their pciids are equal). otherwise, it's still enumerating > > > possibilities. > > > > > > on the other hand, for two mdevs, > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1; > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2; > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between > > > mdev1 <-> mdev2. > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent > > to 1/4 of pdev2? If we really want to allow such thing happen, the best > > choice is to report the same mdev type on both pdev1 and pdev2. > I think that's exactly the value of this migration_version interface. > the management tool can take advantage of this interface to know if two > devices are migration compatible, no matter they are mdevs, non-mdevs, > or mix. > > as I know, (please correct me if not right), current lib
[PATCH v3 2/3] qcow2: Allow resize of images with internal snapshots
We originally refused to allow resize of images with internal snapshots because the v2 image format did not require the tracking of snapshot size, making it impossible to safely revert to a snapshot with a different size than the current view of the image. But the snapshot size tracking was rectified in v3, and our recent fixes to qemu-img amend (see 0a85af35) guarantee that we always have a valid snapshot size. Thus, we no longer need to artificially limit image resizes, but it does become one more thing that would prevent a downgrade back to v2. And now that we support different-sized snapshots, it's also easy to fix reverting to a snapshot to apply the new size. Upgrade iotest 61 to cover this (we previously had NO coverage of refusal to resize while snapshots exist). Note that the amend process can fail but still have effects: in particular, since we break things into upgrade, resize, downgrade, a failure during resize does not roll back changes made during upgrade, nor does failure in downgrade roll back a resize. But this situation is pre-existing even without this patch; and without journaling, the best we could do is minimize the chance of partial failure by collecting all changes prior to doing any writes - which adds a lot of complexity but could still fail with EIO. On the other hand, we are careful that even if we have partial modification but then fail, the image is left viable (that is, we are careful to sequence things so that after each successful cluster write, there may be transient leaked clusters but no corrupt metadata). And complicating the code to make it more transaction-like is not worth the effort: a user can always request multiple 'qemu-img amend' changing one thing each, if they need finer-grained control over detecting the first failure than what they get by letting qemu decide how to sequence multiple changes. Signed-off-by: Eric Blake Reviewed-by: Max Reitz --- block/qcow2-snapshot.c | 20 block/qcow2.c | 25 ++--- tests/qemu-iotests/061 | 35 +++ tests/qemu-iotests/061.out | 28 4 files changed, 101 insertions(+), 7 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 82c32d4c9b08..df16424fd952 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -23,6 +23,7 @@ */ #include "qemu/osdep.h" +#include "sysemu/block-backend.h" #include "qapi/error.h" #include "qcow2.h" #include "qemu/bswap.h" @@ -775,10 +776,21 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) } if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) { -error_report("qcow2: Loading snapshots with different disk " -"size is not implemented"); -ret = -ENOTSUP; -goto fail; +BlockBackend *blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL, +&local_err); +if (!blk) { +error_report_err(local_err); +ret = -ENOTSUP; +goto fail; +} + +ret = blk_truncate(blk, sn->disk_size, true, PREALLOC_MODE_OFF, + &local_err); +blk_unref(blk); +if (ret < 0) { +error_report_err(local_err); +goto fail; +} } /* diff --git a/block/qcow2.c b/block/qcow2.c index f56745c56924..bf739cd06be0 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3988,9 +3988,12 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, qemu_co_mutex_lock(&s->lock); -/* cannot proceed if image has snapshots */ -if (s->nb_snapshots) { -error_setg(errp, "Can't resize an image which has snapshots"); +/* + * Even though we store snapshot size for all images, it was not + * required until v3, so it is not safe to proceed for v2. + */ +if (s->nb_snapshots && s->qcow_version < 3) { +error_setg(errp, "Can't resize a v2 image which has snapshots"); ret = -ENOTSUP; goto fail; } @@ -4952,6 +4955,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version, BDRVQcow2State *s = bs->opaque; int current_version = s->qcow_version; int ret; +int i; /* This is qcow2_downgrade(), not qcow2_upgrade() */ assert(target_version < current_version); @@ -4969,6 +4973,21 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version, return -ENOTSUP; } +/* + * If any internal snapshot has a different size than the current + * image size, or VM state size that exceeds 32 bits, downgrading + * is unsafe. Even though we would still use v3-compliant output + * to preserve that data, other v2 programs might not realize + * those optional fields are important. + */ +for (i = 0; i < s->nb_snapshots; i++) { +if (s->snapshots[i].vm_state_size > U
[PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots
In v3: - patch 1: fix error returns [patchew, Max], R-b dropped - patch 2,3: unchanged, so add R-b Eric Blake (3): block: Add blk_new_with_bs() helper qcow2: Allow resize of images with internal snapshots qcow2: Tweak comment about bitmaps vs. resize include/sysemu/block-backend.h | 2 ++ block/block-backend.c | 23 + block/crypto.c | 10 +++- block/parallels.c | 8 +++--- block/qcow.c | 8 +++--- block/qcow2-snapshot.c | 20 --- block/qcow2.c | 45 +++--- block/qed.c| 9 +++ block/sheepdog.c | 11 - block/vdi.c| 8 +++--- block/vhdx.c | 9 +++ block/vmdk.c | 9 +++ block/vpc.c| 8 +++--- blockdev.c | 8 +++--- blockjob.c | 7 ++ tests/qemu-iotests/061 | 35 ++ tests/qemu-iotests/061.out | 28 + 17 files changed, 177 insertions(+), 71 deletions(-) -- 2.26.2
[PATCH v3 1/3] block: Add blk_new_with_bs() helper
There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz Signed-off-by: Eric Blake --- include/sysemu/block-backend.h | 2 ++ block/block-backend.c | 23 +++ block/crypto.c | 10 -- block/parallels.c | 8 block/qcow.c | 8 block/qcow2.c | 18 -- block/qed.c| 9 - block/sheepdog.c | 11 +-- block/vdi.c| 8 block/vhdx.c | 9 - block/vmdk.c | 9 - block/vpc.c| 8 blockdev.c | 8 +++- blockjob.c | 7 ++- 14 files changed, 75 insertions(+), 63 deletions(-) diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 9bbdbd63d743..d37c1244dd9c 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -77,6 +77,8 @@ typedef struct BlockBackendPublic { } BlockBackendPublic; BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm); +BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm, + uint64_t shared_perm, Error **errp); BlockBackend *blk_new_open(const char *filename, const char *reference, QDict *options, int flags, Error **errp); int blk_get_refcnt(BlockBackend *blk); diff --git a/block/block-backend.c b/block/block-backend.c index 38ae41382652..3592066b4259 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -355,6 +355,29 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) return blk; } +/* + * Create a new BlockBackend connected to an existing BlockDriverState. + * + * @perm is a bitmasks of BLK_PERM_* constants which describes the + * permissions to request for @bs that is attached to this + * BlockBackend. @shared_perm is a bitmask which describes which + * permissions may be granted to other users of the attached node. + * Both sets of permissions can be changed later using blk_set_perm(). + * + * Return the new BlockBackend on success, null on failure. + */ +BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm, + uint64_t shared_perm, Error **errp) +{ +BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm); + +if (blk_insert_bs(blk, bs, errp) < 0) { +blk_unref(blk); +return NULL; +} +return blk; +} + /* * Creates a new BlockBackend, opens a new BlockDriverState, and connects both. * The new BlockBackend is in the main AioContext. diff --git a/block/crypto.c b/block/crypto.c index d577f89659fa..a4d77f07fe8c 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -256,16 +256,14 @@ static int block_crypto_co_create_generic(BlockDriverState *bs, PreallocMode prealloc, Error **errp) { -int ret; +int ret = -EPERM; BlockBackend *blk; QCryptoBlock *crypto = NULL; struct BlockCryptoCreateData data; -blk = blk_new(bdrv_get_aio_context(bs), - BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL); - -ret = blk_insert_bs(blk, bs, errp); -if (ret < 0) { +blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL, + errp); +if (!blk) { goto cleanup; } diff --git a/block/parallels.c b/block/parallels.c index 6d4ed77f165f..aad07b89c818 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -559,10 +559,10 @@ static int coroutine_fn parallels_co_create(BlockdevCreateOptions* opts, return -EIO; } -blk = blk_new(bdrv_get_aio_context(bs), - BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL); -ret = blk_insert_bs(blk, bs, errp); -if (ret < 0) { +blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL, + errp); +if (!blk) { +ret = -EPERM; goto out; } blk_set_allow_write_beyond_eof(blk, true); diff --git a/block/qcow.c b/block/qcow.c index 8973e4e565a1..3f0c60051933 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -849,10 +849,10 @@ static int coroutine_fn qcow_co_create(BlockdevCreateOptions *opts, return -EIO; } -qcow_blk = blk_new(bdrv_get_aio_context(bs), - BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL); -ret = blk_insert_bs(qcow_blk, bs, errp); -if (ret < 0) { +qcow_blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, + BLK_PERM_ALL, errp); +if (!qcow_blk) { +ret = -EPERM; goto exit; } blk_set_allow_write_beyond_eof(qcow_blk, true); diff --git
[PATCH v3 3/3] qcow2: Tweak comment about bitmaps vs. resize
Our comment did not actually match the code. Rewrite the comment to be less sensitive to any future changes to qcow2-bitmap.c that might implement scenarios that we currently reject. Signed-off-by: Eric Blake Reviewed-by: Max Reitz --- block/qcow2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/qcow2.c b/block/qcow2.c index bf739cd06be0..81b75401813c 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3998,7 +3998,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, goto fail; } -/* cannot proceed if image has bitmaps */ +/* See qcow2-bitmap.c for which bitmap scenarios prevent a resize. */ if (qcow2_truncate_bitmaps_check(bs, errp)) { ret = -ENOTSUP; goto fail; -- 2.26.2
[Bug 1874888] [NEW] certain programs make QEMU crash with "tcg fatal error"
Public bug reported: The following code snippet crashes qemu with .../tcg/tcg.c:3279: tcg fatal error qemu-x86_64: /usr/local/google/home/kostik/qemu-5.0.0-rc4/accel/tcg/cpu-exec.c:701: int cpu_exec(CPUState *): Assertion `!have_mmap_lock()' failed. int main() { /* <.data>: 0: 2e 45 71 ff cs rex.RB jno 0x3 4: e9 00 00 f0 00 jmp0xf9 9: c4 42 7d 31 3e vpmovzxbd ymm15,QWORD PTR [r14] e: c4 a3 7d 08 64 82 44vroundps ymm4,YMMWORD PTR [rdx+r8*4+0x44],0x0 15: 00 16: 0f 1e 0anopDWORD PTR [rdx] 19: 43 0f ec 20 rex.XB paddsb mm4,QWORD PTR [r8] 1d: 66 47 0f 3a 0c 3d 00rex.RXB blendps xmm15,XMMWORD PTR [rip+0x8000],0x0# 0x8028 24: 80 00 00 00 28: c4 e3 f9 df 5f 86 0dvaeskeygenassist xmm3,XMMWORD PTR [rdi-0x7a],0xd 2f: c4 e2 55 92 74 fc 0avgatherdps ymm6,DWORD PTR [rsp+ymm7*8+0xa],ymm5 36: c4 e2 f9 17 9a 01 00vptest xmm3,XMMWORD PTR [rdx+0x1] 3d: 00 00 */ char buf[] = { 0x2E, 0x45, 0x71, 0xFF, 0xE9, 0x00, 0x00, 0xF0, 0x00, 0xC4, 0x42, 0x7D, 0x31, 0x3E, 0xC4, 0xA3, 0x7D, 0x08, 0x64, 0x82, 0x44, 0x00, 0x0F, 0x1E, 0x0A, 0x43, 0x0F, 0xEC, 0x20, 0x66, 0x47, 0x0F, 0x3A, 0x0C, 0x3D, 0x00, 0x80, 0x00, 0x00, 0x00, 0xC4, 0xE3, 0xF9, 0xDF, 0x5F, 0x86, 0x0D, 0xC4, 0xE2, 0x55, 0x92, 0x74, 0xFC, 0x0A, 0xC4, 0xE2, 0xF9, 0x17, 0x9A, 0x01, 0x00, 0x00, 0x00 }; void (*f)(void) = (void (*) (void))buf; f(); return 0; } Steps to reproduce: 1) clang -static repro.c -o repro 2) qemu-x86_64-static repro Tested with 4.2.0 and 5.0.0-rc4. Both -user and -system variants are affected. A few more snippets that cause the same sort of behavior: 1) 0x64, 0x46, 0x7D, 0xFF, 0xDF, 0x27, 0x46, 0x0F, 0xD4, 0x83, 0x5E, 0x00, 0x00, 0x00, 0x3E, 0x0F, 0x6A, 0xEF, 0x0F, 0x05, 0xC4, 0x42, 0xFD, 0x1E, 0xCF, 0x46, 0x18, 0xE3, 0x47, 0xCD, 0x4E, 0x6E, 0x0F, 0x0F, 0x16, 0x8A 2) 0x67, 0x45, 0xDB, 0xD0, 0xAA, 0xC4, 0xE2, 0xB1, 0x01, 0x57, 0x00, 0xF3, 0x6F, 0xF3, 0x42, 0x0F, 0x1E, 0xFD, 0x64, 0x2E, 0xF2, 0x45, 0xD9, 0xC4, 0x3E, 0xF3, 0x0F, 0xAE, 0xF4, 0x3E, 0x47, 0x0F, 0x1C, 0x22, 0x42, 0x73, 0xFF, 0xD9, 0xFD ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1874888 Title: certain programs make QEMU crash with "tcg fatal error" Status in QEMU: New Bug description: The following code snippet crashes qemu with .../tcg/tcg.c:3279: tcg fatal error qemu-x86_64: /usr/local/google/home/kostik/qemu-5.0.0-rc4/accel/tcg/cpu-exec.c:701: int cpu_exec(CPUState *): Assertion `!have_mmap_lock()' failed. int main() { /* <.data>: 0: 2e 45 71 ff cs rex.RB jno 0x3 4: e9 00 00 f0 00 jmp0xf9 9: c4 42 7d 31 3e vpmovzxbd ymm15,QWORD PTR [r14] e: c4 a3 7d 08 64 82 44vroundps ymm4,YMMWORD PTR [rdx+r8*4+0x44],0x0 15: 00 16: 0f 1e 0anopDWORD PTR [rdx] 19: 43 0f ec 20 rex.XB paddsb mm4,QWORD PTR [r8] 1d: 66 47 0f 3a 0c 3d 00rex.RXB blendps xmm15,XMMWORD PTR [rip+0x8000],0x0# 0x8028 24: 80 00 00 00 28: c4 e3 f9 df 5f 86 0dvaeskeygenassist xmm3,XMMWORD PTR [rdi-0x7a],0xd 2f: c4 e2 55 92 74 fc 0avgatherdps ymm6,DWORD PTR [rsp+ymm7*8+0xa],ymm5 36: c4 e2 f9 17 9a 01 00vptest xmm3,XMMWORD PTR [rdx+0x1] 3d: 00 00 */ char buf[] = { 0x2E, 0x45, 0x71, 0xFF, 0xE9, 0x00, 0x00, 0xF0, 0x00, 0xC4, 0x42, 0x7D, 0x31, 0x3E, 0xC4, 0xA3, 0x7D, 0x08, 0x64, 0x82, 0x44, 0x00, 0x0F, 0x1E, 0x0A, 0x43, 0x0F, 0xEC, 0x20, 0x66, 0x47, 0x0F, 0x3A, 0x0C, 0x3D, 0x00, 0x80, 0x00, 0x00, 0x00, 0xC4, 0xE3, 0xF9, 0xDF, 0x5F, 0x86, 0x0D, 0xC4, 0xE2, 0x55, 0x92, 0x74, 0xFC, 0x0A, 0xC4, 0xE2, 0xF9, 0x17, 0x9A, 0x01, 0x00, 0x00, 0x00 }; void (*f)(void) = (void (*) (void))buf; f(); return 0; } Steps to reproduce: 1) clang -static repro.c -o repro 2) qemu-x86_64-static repro Tested with 4.2.0 and 5.0.0-rc4. Both -user and -system variants are affected. A few more snippets that cause the same sort of behavior: 1) 0x64, 0x46, 0x7D, 0xFF, 0xDF, 0x27, 0x46, 0x0F, 0xD4, 0x83, 0x5E, 0x00, 0x00, 0x00, 0x3E, 0x0F, 0x6A, 0xEF, 0x0F, 0x05, 0xC4, 0x42, 0xFD, 0x1E, 0xCF, 0x46, 0x18, 0xE3, 0x47, 0xCD, 0x4E, 0x6E, 0x0F, 0x0F, 0x16, 0x8A 2) 0x67, 0x45, 0xDB, 0xD0, 0xAA, 0xC4, 0xE2, 0xB1, 0x01, 0x57, 0x00, 0xF3, 0x6F, 0xF3, 0x42, 0x0F, 0x1E, 0xFD, 0x64, 0x2E, 0xF2, 0x45, 0xD9, 0xC4, 0x3E, 0xF3, 0x0F, 0xAE, 0xF4, 0x3E, 0x47, 0x0F, 0x1C, 0x22, 0x42, 0x73, 0xFF, 0xD9, 0xFD To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1874888/+subscriptions
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On 4/24/20 1:37 PM, Alberto Garcia wrote: On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy wrote: Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. I don't think so, qcow2_get_host_offset() would detect the number of contiguous subclusters of the same type at the given offset. In this case they would be _ZERO subclusters so there's no need to decompress anything, or even read it (it works the same with uncompressed clusters). But if at least one of subclusters to read is not _ZERO, you'll have to decompress the whole cluster, and after decompression rewrite zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) I don't see why you would need to rewrite anything... you do have to decompress the whole cluster, and the uncompressed cluster in memory would have stale data, but you never need to use that data for anything, let alone to return it to the guest. Even if there's a COW, the new cluster would inherit the compressed cluster's bitmap so the zeroized subclusters still read as zeroes. It's the same with normal clusters, 'write -P 0xff 0 64k' followed by 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the L2 entry indicates that part of it is just zeroes. The point is this: Consider 'write -P 0xff 0 64k', then 'write -z 16k 16k', then 'read 0 64k'. For normal clusters, we can just do a scatter-gather iov read of read 0-16k and 32-64k, plus a memset of 16-32k. But for compressed clusters, we have to read and decompress the entire 64k, AND also memset 16k-32k. But if zeroing after reading is not that expensive, then the same technique for normal clusters is fine (instead of a scatter-gather read of 48k, just read the whole 64k cluster before doing the memset). So the question at hand is not what happens in writing, but in reading, and whether we are penalizing reads from a compressed cluster or even from regular clusters, when reading from a cluster where subclusters have different status. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses
On Fri, Apr 24, 2020 at 03:23:56PM +, Ani Sinha wrote: > > > > On Apr 22, 2020, at 4:15 PM, Ani Sinha wrote: > > > > > > > >> On Apr 21, 2020, at 8:32 PM, Daniel P. Berrangé > >> wrote: > >> > >> On Tue, Apr 21, 2020 at 02:45:04PM +, Ani Sinha wrote: > >>> > >>> > On Apr 20, 2020, at 8:32 PM, Michael S. Tsirkin wrote: > > But I for one would like to focus on keeping PIIX stable > and focus development on q35. Not bloating PIIX with lots of new > features is IMHO a good way to do that. > >>> > >>> Does this mean this patch is a no-go then? :( > >> > >> I'd support this patch, as I don't think it can really be described as > >> bloat or destabalizing. It is just adding a simple property to > >> conditionalize existing functionality. Telling people to switch to Q35 > >> is unreasonable as it is not a simple 1-1 conversion from existing use > >> of PIIX. Q35 has much higher complexity in its configuration, has higher > >> memory overhead per VM too, and lacks certain features of PIIX too. > > > > Cool. How do we go forward from here? > > > > We would really appreciate if we can add this extra knob in > Qemu. Maybe someone else also in the community will find this > useful. We don’t want to maintain this patch internally forever > but rather prefer we maintain this as a Qemu community. Michael, I agree with Daniel here and I don't think we should start refusing PIIX features if they are useful for a portion of the QEMU community. Would you reconsider and merge this patch? -- Eduardo
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy wrote: > AFAIK, now compressed clusters can't be used in scenarios with guest, > as qcow2 driver doesn't support rewriting them. You can write to those images just fine, it's just not efficient because you have to COW the compressed clusters. > Or am I wrong? And we normally don't combine normal and compressed > clusters together in one image. As soon as you start writing to an image with compressed clusters you'll have a combination of both. But it's true that you don't have an image with compressed clusters if what you're looking for is performance. So I wouldn't add support for this if it complicates things too much. Berto
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy wrote: >>> Reading the entire cluster will be interesting - you'll have to >>> decompress the entire memory, then overwrite the zeroed portions. >> >> I don't think so, qcow2_get_host_offset() would detect the number of >> contiguous subclusters of the same type at the given offset. In this >> case they would be _ZERO subclusters so there's no need to decompress >> anything, or even read it (it works the same with uncompressed >> clusters). > > But if at least one of subclusters to read is not _ZERO, you'll have > to decompress the whole cluster, and after decompression rewrite > zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) I don't see why you would need to rewrite anything... you do have to decompress the whole cluster, and the uncompressed cluster in memory would have stale data, but you never need to use that data for anything, let alone to return it to the guest. Even if there's a COW, the new cluster would inherit the compressed cluster's bitmap so the zeroized subclusters still read as zeroes. It's the same with normal clusters, 'write -P 0xff 0 64k' followed by 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the L2 entry indicates that part of it is just zeroes. Berto
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
24.04.2020 20:56, Alberto Garcia wrote: On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake wrote: at the same time, I can see where you're coming from in stating that if it makes management of extended L2 easier to allow zero subclusters on top of a compressed cluster, then there's no reason to forbid it. I'm not sure if it makes it easier. Some operations are definitely going to be easier but maybe we have to add and handle _ZERO_COMPRESSED in addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated subclusters). Or maybe replace QCow2SubclusterType with something else. I need to evaluate that. Reviewing your series it already came in my mind, that we are doing too much with the conversion from l2e flags to "type". Does it worth it? All these ZERO_PLAIN and UNALLOCATED_ALLOC, and "case :" lines combined by three-four into one case, do they help, or is it an extra work? We just have to maintain two views of one model.. But I don't suggest to refactor it in these series :) Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. I don't think so, qcow2_get_host_offset() would detect the number of contiguous subclusters of the same type at the given offset. In this case they would be _ZERO subclusters so there's no need to decompress anything, or even read it (it works the same with uncompressed clusters). But if at least one of subclusters to read is not _ZERO, you'll have to decompress the whole cluster, and after decompression rewrite zero-subclusters by zeroes, as Eric says.. Or I lost the thread:) -- Best regards, Vladimir
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
24.04.2020 20:44, Eric Blake wrote: On 4/24/20 12:21 PM, Alberto Garcia wrote: On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake wrote: 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any copy on write. The compressed data would remain untouched on disk but some of the subclusters would have the 'all zeroes' bit set, exactly like what happens with normal clusters. It's a special case that avoids COW for write zeroes, but not for anything else. The moment you write any data (whether to the zero-above-compressed or the regular compressed portion), the entire cluster has to be rewritten. That's right but you can still write zeroes without having to rewrite anything, and read back the zeroes without having to decompress the data. at the same time, I can see where you're coming from in stating that if it makes management of extended L2 easier to allow zero subclusters on top of a compressed cluster, then there's no reason to forbid it. I'm not sure if it makes it easier. Some operations are definitely going to be easier but maybe we have to add and handle _ZERO_COMPRESSED in addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated subclusters). Or maybe replace QCow2SubclusterType with something else. I need to evaluate that. Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. The savings in reading occur only when your read is limited to just the subclusters that are zeroed. But then again, even on a regular cluster, read has to pay attention to which subclusters are zeroed, so you already have the workhorse in read for detecting whether a normal read is sufficient or if you have to follow up with piecing together zeroed sections. AFAIK, now compressed clusters can't be used in scenarios with guest, as qcow2 driver doesn't support rewriting them. Or am I wrong? And we normally don't combine normal and compressed clusters together in one image. So, currently, the usual use-case of compressed clusters is a fully compressed image, written once. It means, that with current specification, subclusters adds nothing to this case, and no reason to create compressed image with subclusters. And even if we allow zero/unallocated subclusters, seems it adds nothing to this use-case. So, I don't see real benefits of it for now, but neither any problems with it, so agree that it's mostly about which way is simpler.. -- Best regards, Vladimir
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake wrote: >>> at the same time, I can see where you're coming from in stating that >>> if it makes management of extended L2 easier to allow zero subclusters >>> on top of a compressed cluster, then there's no reason to forbid it. >> >> I'm not sure if it makes it easier. Some operations are definitely going >> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in >> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated >> subclusters). Or maybe replace QCow2SubclusterType with something >> else. I need to evaluate that. > > Reading the entire cluster will be interesting - you'll have to > decompress the entire memory, then overwrite the zeroed portions. I don't think so, qcow2_get_host_offset() would detect the number of contiguous subclusters of the same type at the given offset. In this case they would be _ZERO subclusters so there's no need to decompress anything, or even read it (it works the same with uncompressed clusters). Berto
Re: [RFC patch v1 1/3] qemu-file: introduce current buffer
13.04.2020 14:12, Denis Plotnikov wrote: To approach async wrtiting in the further commits, the buffer allocated in QEMUFile struct is replaced with the link to the current buffer. We're going to use many buffers to write the qemu file stream to the unerlying storage asynchronously. The current buffer points out to the buffer is currently filled with data. This patch doesn't add any features to qemu-file and doesn't change any qemu-file behavior. Signed-off-by: Denis Plotnikov Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On 4/24/20 12:21 PM, Alberto Garcia wrote: On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake wrote: 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any copy on write. The compressed data would remain untouched on disk but some of the subclusters would have the 'all zeroes' bit set, exactly like what happens with normal clusters. It's a special case that avoids COW for write zeroes, but not for anything else. The moment you write any data (whether to the zero-above-compressed or the regular compressed portion), the entire cluster has to be rewritten. That's right but you can still write zeroes without having to rewrite anything, and read back the zeroes without having to decompress the data. at the same time, I can see where you're coming from in stating that if it makes management of extended L2 easier to allow zero subclusters on top of a compressed cluster, then there's no reason to forbid it. I'm not sure if it makes it easier. Some operations are definitely going to be easier but maybe we have to add and handle _ZERO_COMPRESSED in addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated subclusters). Or maybe replace QCow2SubclusterType with something else. I need to evaluate that. Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions. The savings in reading occur only when your read is limited to just the subclusters that are zeroed. But then again, even on a regular cluster, read has to pay attention to which subclusters are zeroed, so you already have the workhorse in read for detecting whether a normal read is sufficient or if you have to follow up with piecing together zeroed sections. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[RFC PATCH] translate-all: include guest address in out_asm output
This is a slightly hackish Friday afternoon attempt to include the guest address in our out_asm output in an effort to make it a little easier to see what is generating what final assembly. Signed-off-by: Alex Bennée --- accel/tcg/translate-all.c | 38 -- 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 9924e66d1f..31711de938 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1789,14 +1789,42 @@ TranslationBlock *tb_gen_code(CPUState *cpu, if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) && qemu_log_in_addr_range(tb->pc)) { FILE *logfile = qemu_log_lock(); +size_t code_size, data_size = 0; +size_t insn_start; +int insn = 0; qemu_log("OUT: [size=%d]\n", gen_code_size); if (tcg_ctx->data_gen_ptr) { -size_t code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr; -size_t data_size = gen_code_size - code_size; -size_t i; +code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr; +data_size = gen_code_size - code_size; +} else { +code_size = gen_code_size; +} + +/* first dump prologue */ +insn_start = tcg_ctx->gen_insn_end_off[0]; +if (insn_start > 0) { +qemu_log(" prologue: [size=%ld]\n", insn_start); +log_disas(tb->tc.ptr, insn_start); +} + +do { +size_t insn_end; +if (insn < (tb->icount - 1)) { +insn_end = tcg_ctx->gen_insn_end_off[insn + 1]; +} else { +insn_end = code_size; +} +qemu_log(" for guest addr: " TARGET_FMT_lx ":\n", + tcg_ctx->gen_insn_data[insn][0]); + +log_disas(tb->tc.ptr + insn_start, insn_end - insn_start); -log_disas(tb->tc.ptr, code_size); +insn_start = insn_end; +} while (++insn < tb->icount && insn_start < code_size); +if (data_size) { +int i; +qemu_log(" data: [size=%ld]\n", data_size); for (i = 0; i < data_size; i += sizeof(tcg_target_ulong)) { if (sizeof(tcg_target_ulong) == 8) { qemu_log("0x%08" PRIxPTR ": .quad 0x%016" PRIx64 "\n", @@ -1808,8 +1836,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu, *(uint32_t *)(tcg_ctx->data_gen_ptr + i)); } } -} else { -log_disas(tb->tc.ptr, gen_code_size); } qemu_log("\n"); qemu_log_flush(); -- 2.20.1
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake wrote: >> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any >> copy on write. The compressed data would remain untouched on disk but >> some of the subclusters would have the 'all zeroes' bit set, exactly >> like what happens with normal clusters. > > It's a special case that avoids COW for write zeroes, but not for > anything else. The moment you write any data (whether to the > zero-above-compressed or the regular compressed portion), the entire > cluster has to be rewritten. That's right but you can still write zeroes without having to rewrite anything, and read back the zeroes without having to decompress the data. > at the same time, I can see where you're coming from in stating that > if it makes management of extended L2 easier to allow zero subclusters > on top of a compressed cluster, then there's no reason to forbid it. I'm not sure if it makes it easier. Some operations are definitely going to be easier but maybe we have to add and handle _ZERO_COMPRESSED in addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated subclusters). Or maybe replace QCow2SubclusterType with something else. I need to evaluate that. Berto
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On 4/24/20 12:02 PM, Alberto Garcia wrote: On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia wrote: Compressed clusters always have the bitmap part of the extended L2 entry set to 0. I was just finishing some improvements to the new code that allows BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to entertain the idea of using the L2 bitmap for compressed clusters as well. I will make some tests next week, but I would like to know your opinion in case I'm missing something. A compressed cluster cannot be divided into subclusters on the image: you would not be able to allocate or overwrite them separately, therefore any write request necessarily has to write (or do COW of) the whole cluster. However if you consider the uncompressed guest data I don't see any reason why you wouldn't be able to zeroize or even deallocate individual subclusters. These operations don't touch the cluster data on disk anyway, they only touch the L2 metadata in order to change what the guest sees. 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any copy on write. The compressed data would remain untouched on disk but some of the subclusters would have the 'all zeroes' bit set, exactly like what happens with normal clusters. It's a special case that avoids COW for write zeroes, but not for anything else. The moment you write any data (whether to the zero-above-compressed or the regular compressed portion), the entire cluster has to be rewritten. I'm not sure how frequently guests will actually have the scenario of doing a zero request on a sub-cluster, but at the same time, I can see where you're coming from in stating that if it makes management of extended L2 easier to allow zero subclusters on top of a compressed cluster, then there's no reason to forbid it. I think that this would make the on-disk format a bit simpler in general (no need to treat compressed clusters differently in some cases) and it would add a new optimization to compressed images. I just need to make sure that it doesn't complicate the code (my feeling is that it would actually simplify it, but I have to see). Opinions? Berto -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia wrote: > Compressed clusters always have the bitmap part of the extended L2 > entry set to 0. I was just finishing some improvements to the new code that allows BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to entertain the idea of using the L2 bitmap for compressed clusters as well. I will make some tests next week, but I would like to know your opinion in case I'm missing something. A compressed cluster cannot be divided into subclusters on the image: you would not be able to allocate or overwrite them separately, therefore any write request necessarily has to write (or do COW of) the whole cluster. However if you consider the uncompressed guest data I don't see any reason why you wouldn't be able to zeroize or even deallocate individual subclusters. These operations don't touch the cluster data on disk anyway, they only touch the L2 metadata in order to change what the guest sees. 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any copy on write. The compressed data would remain untouched on disk but some of the subclusters would have the 'all zeroes' bit set, exactly like what happens with normal clusters. I think that this would make the on-disk format a bit simpler in general (no need to treat compressed clusters differently in some cases) and it would add a new optimization to compressed images. I just need to make sure that it doesn't complicate the code (my feeling is that it would actually simplify it, but I have to see). Opinions? Berto
[PATCH v22 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting
From: Alexander Duyck Add support for free page reporting. The idea is to function very similar to how the balloon works in that we basically end up madvising the page as not being used. However we don't really need to bother with any deflate type logic since the page will be faulted back into the guest when it is read or written to. This provides a new way of letting the guest proactively report free pages to the hypervisor, so the hypervisor can reuse them. In contrast to inflate/deflate that is triggered via the hypervisor explicitly. Signed-off-by: Alexander Duyck --- hw/virtio/virtio-balloon.c | 67 include/hw/virtio/virtio-balloon.h |2 + 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index c1c76ec09c95..2ce56c6c0794 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v, balloon_stats_change_timer(s, 0); } +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq) +{ +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); +VirtQueueElement *elem; + +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement { +unsigned int i; + +/* + * When we discard the page it has the effect of removing the page + * from the hypervisor itself and causing it to be zeroed when it + * is returned to us. So we must not discard the page if it is + * accessible by another device or process, or if the guest is + * expecting it to retain a non-zero value. + */ +if (qemu_balloon_is_inhibited() || dev->poison_val) { +goto skip_element; +} + +for (i = 0; i < elem->in_num; i++) { +void *addr = elem->in_sg[i].iov_base; +size_t size = elem->in_sg[i].iov_len; +ram_addr_t ram_offset; +RAMBlock *rb; + +/* + * There is no need to check the memory section to see if + * it is ram/readonly/romd like there is for handle_output + * below. If the region is not meant to be written to then + * address_space_map will have allocated a bounce buffer + * and it will be freed in address_space_unmap and trigger + * and unassigned_mem_write before failing to copy over the + * buffer. If more than one bad descriptor is provided it + * will return NULL after the first bounce buffer and fail + * to map any resources. + */ +rb = qemu_ram_block_from_host(addr, false, &ram_offset); +if (!rb) { +trace_virtio_balloon_bad_addr(elem->in_addr[i]); +continue; +} + +/* + * For now we will simply ignore unaligned memory regions, or + * regions that overrun the end of the RAMBlock. + */ +if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) || +(ram_offset + size) > qemu_ram_get_used_length(rb)) { +continue; +} + +ram_block_discard_range(rb, ram_offset, size); +} + +skip_element: +virtqueue_push(vq, elem, 0); +virtio_notify(vdev, vq); +g_free(elem); +} +} + static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBalloon *s = VIRTIO_BALLOON(vdev); @@ -818,6 +879,10 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output); s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats); +if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) { +s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report); +} + if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, @@ -945,6 +1010,8 @@ static Property virtio_balloon_properties[] = { VIRTIO_BALLOON_F_FREE_PAGE_HINT, false), DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features, VIRTIO_BALLOON_F_PAGE_POISON, true), +DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features, +VIRTIO_BALLOON_F_REPORTING, false), /* QEMU 4.0 accidentally changed the config size even when free-page-hint * is disabled, resulting in QEMU 3.1 migration incompatibility. This * property retains this quirk for QEMU 4.1 machine types. diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index 3ca2a78e1aca..ac4013d51010 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_h
[PATCH v22 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'
From: Alexander Duyck In an upcoming patch a feature named Free Page Reporting is about to be added. In order to avoid any confusion we should drop the use of the word 'report' when referring to Free Page Hinting. So what this patch does is go through and replace all instances of 'report' with 'hint" when we are referring to free page hinting. Acked-by: David Hildenbrand Signed-off-by: Alexander Duyck --- hw/virtio/virtio-balloon.c | 74 ++-- include/hw/virtio/virtio-balloon.h | 20 +- 2 files changed, 47 insertions(+), 47 deletions(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index a4729f7fc930..a1d6fb52c876 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev) ret = false; goto out; } -if (id == dev->free_page_report_cmd_id) { -dev->free_page_report_status = FREE_PAGE_REPORT_S_START; +if (id == dev->free_page_hint_cmd_id) { +dev->free_page_hint_status = FREE_PAGE_HINT_S_START; } else { /* * Stop the optimization only when it has started. This * avoids a stale stop sign for the previous command. */ -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP; +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP; } } } if (elem->in_num) { -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { qemu_guest_free_page_hint(elem->in_sg[0].iov_base, elem->in_sg[0].iov_len); } @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void *opaque) qemu_mutex_unlock(&dev->free_page_lock); virtio_notify(vdev, vq); /* - * Start to poll the vq once the reporting started. Otherwise, continue + * Start to poll the vq once the hinting started. Otherwise, continue * only when there are entries on the vq, which need to be given back. */ } while (continue_to_get_hints || - dev->free_page_report_status == FREE_PAGE_REPORT_S_START); + dev->free_page_hint_status == FREE_PAGE_HINT_S_START); virtio_queue_set_notification(vq, 1); } @@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s) return; } -if (s->free_page_report_cmd_id == UINT_MAX) { -s->free_page_report_cmd_id = - VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; +if (s->free_page_hint_cmd_id == UINT_MAX) { +s->free_page_hint_cmd_id = + VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; } else { -s->free_page_report_cmd_id++; +s->free_page_hint_cmd_id++; } -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED; +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED; virtio_notify_config(vdev); } @@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon *s) { VirtIODevice *vdev = VIRTIO_DEVICE(s); -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) { /* * The lock also guarantees us that the * virtio_ballloon_get_free_page_hints exits after the - * free_page_report_status is set to S_STOP. + * free_page_hint_status is set to S_STOP. */ qemu_mutex_lock(&s->free_page_lock); /* * The guest hasn't done the reporting, so host sends a notification * to the guest to actively stop the reporting. */ -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP; +s->free_page_hint_status = FREE_PAGE_HINT_S_STOP; qemu_mutex_unlock(&s->free_page_lock); virtio_notify_config(vdev); } @@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon *s) { VirtIODevice *vdev = VIRTIO_DEVICE(s); -s->free_page_report_status = FREE_PAGE_REPORT_S_DONE; +s->free_page_hint_status = FREE_PAGE_HINT_S_DONE; virtio_notify_config(vdev); } static int -virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data) +virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data) { VirtIOBalloon *dev = container_of(n, VirtIOBalloon, - free_page_report_notify); + free_page_hint_notify); VirtIODevice *vdev = VIRTIO_DEVICE(dev); PrecopyNotifyData *pnd = data; @@ -624,7 +624,7 @@ static size_t virtio_balloon_config_size(
[PATCH v22 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id
From: Alexander Duyck Sync to the latest upstream changes for free page hinting. To be replaced by a full linux header sync. Signed-off-by: Alexander Duyck --- include/standard-headers/linux/virtio_balloon.h | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h index 9375ca2a70de..af0a6b59dab2 100644 --- a/include/standard-headers/linux/virtio_balloon.h +++ b/include/standard-headers/linux/virtio_balloon.h @@ -47,8 +47,15 @@ struct virtio_balloon_config { uint32_t num_pages; /* Number of pages we've actually got in balloon. */ uint32_t actual; - /* Free page report command id, readonly by guest */ - uint32_t free_page_report_cmd_id; + /* +* Free page hint command id, readonly by guest. +* Was previously name free_page_report_cmd_id so we +* need to carry that name for legacy support. +*/ + union { + uint32_t free_page_hint_cmd_id; + uint32_t free_page_report_cmd_id; /* deprecated */ + }; /* Stores PAGE_POISON if page poisoning is in use */ uint32_t poison_val; };
[PATCH v22 QEMU 4/5] virtio-balloon: Implement support for page poison reporting feature
From: Alexander Duyck We need to make certain to advertise support for page poison reporting if we want to actually get data on if the guest will be poisoning pages. Add a value for reporting the poison value being used if page poisoning is enabled in the guest. With this we can determine if we will need to skip free page reporting when it is enabled in the future. The value currently has no impact on existing balloon interfaces. In the case of existing balloon interfaces the onus is on the guest driver to reapply whatever poison is in place. When we add free page reporting the poison value is used to determine if we can perform in-place page reporting. The expectation is that a reported page will already contain the value specified by the poison, and the reporting of the page should not change that value. Signed-off-by: Alexander Duyck --- hw/virtio/virtio-balloon.c | 29 + include/hw/virtio/virtio-balloon.h |1 + 2 files changed, 30 insertions(+) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index a1d6fb52c876..c1c76ec09c95 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) config.num_pages = cpu_to_le32(dev->num_pages); config.actual = cpu_to_le32(dev->actual); +config.poison_val = cpu_to_le32(dev->poison_val); if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) { config.free_page_hint_cmd_id = @@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void) return size; } +static bool virtio_balloon_page_poison_support(void *opaque) +{ +VirtIOBalloon *s = opaque; +VirtIODevice *vdev = VIRTIO_DEVICE(s); + +return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON); +} + static void virtio_balloon_set_config(VirtIODevice *vdev, const uint8_t *config_data) { @@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev, qapi_event_send_balloon_change(vm_ram_size - ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT)); } +dev->poison_val = 0; +if (virtio_balloon_page_poison_support(dev)) { +dev->poison_val = le32_to_cpu(config.poison_val); +} trace_virtio_balloon_set_config(dev->actual, oldactual); } @@ -755,6 +768,17 @@ static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { } }; +static const VMStateDescription vmstate_virtio_balloon_page_poison = { +.name = "vitio-balloon-device/page-poison", +.version_id = 1, +.minimum_version_id = 1, +.needed = virtio_balloon_page_poison_support, +.fields = (VMStateField[]) { +VMSTATE_UINT32(poison_val, VirtIOBalloon), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_virtio_balloon_device = { .name = "virtio-balloon-device", .version_id = 1, @@ -767,6 +791,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = { }, .subsections = (const VMStateDescription * []) { &vmstate_virtio_balloon_free_page_hint, +&vmstate_virtio_balloon_page_poison, NULL } }; @@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev) g_free(s->stats_vq_elem); s->stats_vq_elem = NULL; } + +s->poison_val = 0; } static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status) @@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = { VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false), DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features, VIRTIO_BALLOON_F_FREE_PAGE_HINT, false), +DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features, +VIRTIO_BALLOON_F_PAGE_POISON, true), /* QEMU 4.0 accidentally changed the config size even when free-page-hint * is disabled, resulting in QEMU 3.1 migration incompatibility. This * property retains this quirk for QEMU 4.1 machine types. diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index 108cff97e71a..3ca2a78e1aca 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -70,6 +70,7 @@ typedef struct VirtIOBalloon { uint32_t host_features; bool qemu_4_0_config_size; +uint32_t poison_val; } VirtIOBalloon; #endif
[PATCH v22 QEMU 2/5] linux-headers: update to contain virito-balloon free page reporting
From: Alexander Duyck Sync the latest upstream changes for free page reporting. To be replaced by a full linux header sync. Signed-off-by: Alexander Duyck --- include/standard-headers/linux/virtio_balloon.h |1 + 1 file changed, 1 insertion(+) diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h index af0a6b59dab2..af3b7a1fa263 100644 --- a/include/standard-headers/linux/virtio_balloon.h +++ b/include/standard-headers/linux/virtio_balloon.h @@ -36,6 +36,7 @@ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */ #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ +#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12
[PATCH v22 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting
This series provides an asynchronous means of reporting free guest pages to QEMU through virtio-balloon so that the memory associated with those pages can be discarded and reused by other processes and/or guests on the host. Using this it is possible to avoid unnecessary I/O to disk and greatly improve performance in the case of memory overcommit on the host. As a part of enabling this feature it was necessary to implement the page poison reporting feature which had been added to the kernel, but was not available in QEMU. This patch set adds it as a new device property "page-poison" which is enabled by default, and adds support for migrating the reported value. I originally submitted this patch series back on February 11th 2020[1], but at that time I was focused primarily on the kernel portion of this patch set. However as of April 7th those patches are now included in Linus's kernel tree[2] and so I am submitting the QEMU pieces for inclusion. [1]: https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/ [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc Changes from v17: Fixed typo in patch 1 title Addressed white-space issues reported via checkpatch Added braces {} for two if statements to match expected coding style Changes from v18: Updated patches 2 and 3 based on input from dhildenb Added comment to patch 2 describing what keeps us from reporting a bad page Added patch to address issue with ROM devices being directly writable Changes from v19: Added std-headers change to match changes pushed for linux kernel headers Added patch to remove "report" from page hinting code paths Updated comment to better explain why we disable hints w/ page poisoning Removed code that was modifying config size for poison vs hinting Dropped x-page-poison property Added code to bounds check the reported region vs the RAM block Dropped patch for ROM devices as that was already pulled in by Paolo Changes from v20: Rearranged patches to push Linux header sync patches to front Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true Changes from v21: Added ack for patch 3 Rewrote patch description for page poison reporting feature Made page-poison independent property and set to enabled by default Added logic to migrate poison_val Added several comments in code to better explain features Switched free-page-reporting property to disabled by default --- Alexander Duyck (5): linux-headers: Update to allow renaming of free_page_report_cmd_id linux-headers: update to contain virito-balloon free page reporting virtio-balloon: Replace free page hinting references to 'report' with 'hint' virtio-balloon: Implement support for page poison reporting feature virtio-balloon: Provide an interface for free page reporting hw/virtio/virtio-balloon.c | 170 ++- include/hw/virtio/virtio-balloon.h | 23 ++- include/standard-headers/linux/virtio_balloon.h | 12 +- 3 files changed, 155 insertions(+), 50 deletions(-) --
Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type
On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton wrote: > > Allows the CPU to be changed from the default via the -cpu command > line option. > > Signed-off-by: Corey Wharton > Reviewed-by: Bin Meng > Reviewed-by: Alistair Francis Thanks for the patch. Unfortunately this fails `make check`. The problem is that the machine has the `default_cpu_type` set but then you set "cpu-type" from the SoC. This diff fixes the make check failure for me: diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index 1fd08f325c..b53109521e 100644 --- a/hw/riscv/sifive_e.c +++ b/hw/riscv/sifive_e.c @@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj) object_initialize_child(obj, "cpus", &s->cpus, sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, &error_abort, NULL); -object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", -&error_abort); object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", &error_abort); sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0", @@ -141,6 +139,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp) SiFiveESoCState *s = RISCV_E_SOC(dev); MemoryRegion *sys_mem = get_system_memory(); +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", +&error_abort); object_property_set_bool(OBJECT(&s->cpus), true, "realized", &error_abort); I'm happy to just squash that into the patch. Let me know if you don't want me to do that and I'll drop these patches and let you re-send them. Alistair > --- > hw/riscv/sifive_e.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c > index a254cad489..b0a611adb9 100644 > --- a/hw/riscv/sifive_e.c > +++ b/hw/riscv/sifive_e.c > @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj) > object_initialize_child(obj, "cpus", &s->cpus, > sizeof(s->cpus), TYPE_RISCV_HART_ARRAY, > &error_abort, NULL); > -object_property_set_str(OBJECT(&s->cpus), SIFIVE_E_CPU, "cpu-type", > +object_property_set_str(OBJECT(&s->cpus), ms->cpu_type, "cpu-type", > &error_abort); > object_property_set_int(OBJECT(&s->cpus), ms->smp.cpus, "num-harts", > &error_abort); > @@ -220,6 +220,7 @@ static void riscv_sifive_e_machine_init(MachineClass *mc) > mc->desc = "RISC-V Board compatible with SiFive E SDK"; > mc->init = riscv_sifive_e_init; > mc->max_cpus = 1; > +mc->default_cpu_type = SIFIVE_E_CPU; > } > > DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init) > -- > 2.21.1 > >
Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting
On Fri, Apr 24, 2020 at 8:34 AM David Hildenbrand wrote: > > > >> Also, I do wonder if we want to default-enable it. It can still have a > >> negative performance impact and some people might not want that. > > > > The negative performance impact should be minimal. At this point the > > hinting process ends up being very light since it only processes a > > small chunk of the memory before it shuts down for a couple seconds. > > In my original data the only time any significant performance > > regression was seen was with page shuffling enabled, and that was > > before I put limits on how many pages we could process per pass and > > how frequently we would make a pass through memory. My thought is that > > we are much better having it enabled by default rather than having it > > disabled by default. In the worst case scenario we have a little bit > > of extra thread noise from it popping up running for a few > > milliseconds and then sleeping for about two seconds, but the benefit > > side is that the VMs will do us a favor and restrict themselves to a > > much smaller idle footprint until such time as the memory is actually > > needed in the guest. > > While I agree that the impact is small, there *is* a noticeable > performance impact. One of the main users is memory overcommit. > > Also, imagine the following scenarios: > - RT workload in the guest. Just because you have a virtio-balloon > device defined would mean something is suddenly active and > discarding/trying to discard pages. > - vfio: free page reporting is completely useless right now and > *only* overhead. > - prealloc does not expect that pages get suddenly discarded > - s390x, which has CMM and might not benefit at all (except when using > huge pages as backing storage) > > No, I don't think it's acceptable to enable this as default. I remember > that it is quite common to just define a balloon device but never use it. > > E.g., "A virtual memory balloon device is added to all Xen and KVM/QEMU > guest virtual machines. It will be seen as element" [1]. > > I think we should let upper layers decide (just as we do for free page > hinting, for example). Okay. I guess there are a number of other cases I hand't thought of. I will switch the default to false. Thanks. - Alex
Re: [PATCH 2/3] fuzz: Simplify how we compute available machines and types
On 200424 0911, Markus Armbruster wrote: > apply_to_qlist(), apply_to_node() work with QObjects. This is > designed for use by tests/qtest/qos-test.c, which gets the data in > that form via QMP. Goes back to commit fc281c8020 "tests: qgraph API > for the qtest driver framework". > > Commit 275ab39d86 "fuzz: add support for qos-assisted fuzz targets" > added another user: qtest/fuzz/qos_fuzz.c. To get the data as > QObjects, it uses qmp_marshal_query_machines() and > qmp_marshal_qom_list_types(). > > All this code is rather cumbersome. Switch to working with generated > QAPI types instead: > > * Replace apply_to_qlist() & friends by machines_apply_to_node() and > types_apply_to_node(). > > * Have qos_fuzz.c use qmp_query_machines() and qmp_qom_list_types() > instead. > > * Have qos_test.c convert from QObject to the QAPI types. > > Signed-off-by: Markus Armbruster Thank you for looking at this! Reviewed-by: Alexander Bulekov > --- > tests/qtest/libqos/qos_external.h | 8 +++- > tests/qtest/fuzz/qos_fuzz.c | 34 --- > tests/qtest/libqos/qos_external.c | 70 +++ > tests/qtest/qos-test.c| 29 + > 4 files changed, 59 insertions(+), 82 deletions(-) > > diff --git a/tests/qtest/libqos/qos_external.h > b/tests/qtest/libqos/qos_external.h > index 7b44930c55..f63388cb30 100644 > --- a/tests/qtest/libqos/qos_external.h > +++ b/tests/qtest/libqos/qos_external.h > @@ -20,8 +20,12 @@ > #define QOS_EXTERNAL_H > #include "libqos/qgraph.h" > > -void apply_to_node(const char *name, bool is_machine, bool is_abstract); > -void apply_to_qlist(QList *list, bool is_machine); > +#include "libqos/malloc.h" > +#include "qapi/qapi-types-machine.h" > +#include "qapi/qapi-types-qom.h" > + > +void machines_apply_to_node(MachineInfoList *mach_info); > +void types_apply_to_node(ObjectTypeInfoList *type_info); > QGuestAllocator *get_machine_allocator(QOSGraphObject *obj); > void *allocate_objects(QTestState *qts, char **path, QGuestAllocator > **p_alloc); > > diff --git a/tests/qtest/fuzz/qos_fuzz.c b/tests/qtest/fuzz/qos_fuzz.c > index af28c92866..87eadb0889 100644 > --- a/tests/qtest/fuzz/qos_fuzz.c > +++ b/tests/qtest/fuzz/qos_fuzz.c > @@ -36,7 +36,6 @@ > > #include "qapi/qapi-commands-machine.h" > #include "qapi/qapi-commands-qom.h" > -#include "qapi/qmp/qlist.h" > > > void *fuzz_qos_obj; > @@ -45,34 +44,19 @@ QGuestAllocator *fuzz_qos_alloc; > static const char *fuzz_target_name; > static char **fuzz_path_vec; > > -/* > - * Replaced the qmp commands with direct qmp_marshal calls. > - * Probably there is a better way to do this > - */ > static void qos_set_machines_devices_available(void) > { > -QDict *req = qdict_new(); > -QObject *response; > -QDict *args = qdict_new(); > -QList *lst; > +MachineInfoList *mach_info; > +ObjectTypeInfoList *type_info; > > -qmp_marshal_query_machines(NULL, &response, &error_abort); > -lst = qobject_to(QList, response); > -apply_to_qlist(lst, true); > +mach_info = qmp_query_machines(&error_abort); > +machines_apply_to_node(mach_info); > +qapi_free_MachineInfoList(mach_info); > > -qobject_unref(response); > - > - > -qdict_put_str(req, "execute", "qom-list-types"); > -qdict_put_str(args, "implements", "device"); > -qdict_put_bool(args, "abstract", true); > -qdict_put_obj(req, "arguments", (QObject *) args); > - > -qmp_marshal_qom_list_types(args, &response, &error_abort); > -lst = qobject_to(QList, response); > -apply_to_qlist(lst, false); > -qobject_unref(response); > -qobject_unref(req); > +type_info = qmp_qom_list_types(true, "device", true, true, > + &error_abort); > +types_apply_to_node(type_info); > +qapi_free_ObjectTypeInfoList(type_info); > } > > static char **current_path; > diff --git a/tests/qtest/libqos/qos_external.c > b/tests/qtest/libqos/qos_external.c > index 398556dde0..c707dac3b9 100644 > --- a/tests/qtest/libqos/qos_external.c > +++ b/tests/qtest/libqos/qos_external.c > @@ -29,62 +29,40 @@ > #include "libqos/qgraph_internal.h" > #include "libqos/qos_external.h" > > - > - > -void apply_to_node(const char *name, bool is_machine, bool is_abstract) > +static void machine_apply_to_node(const char *name) > { > -char *machine_name = NULL; > -if (is_machine) { > -const char *arch = qtest_get_arch(); > -machine_name = g_strconcat(arch, "/", name, NULL); > -name = machine_name; > +char *machine_name = g_strconcat(qtest_get_arch(), "/", name, NULL); > + > +qos_graph_node_set_availability(machine_name, true); > +g_free(machine_name); > +} > + > +void machines_apply_to_node(MachineInfoList *mach_info) > +{ > +MachineInfoList *tail; > + > +for (tail = mach_info; tail; tail = tail->next) { > +machine_apply_to_node(tail->value->name); > +if (tail->value->alias) { > +
Re: [PATCH 1/3] Makefile: Drop unused, broken target recurse-fuzz
On 200424 0911, Markus Armbruster wrote: > Target recurse-fuzz depends on pc-bios/optionrom/fuzz, which can't be > made. It's not used anywhere. Added in commit c621dc3e01c, looks > like cargo cult. Delete. > > Signed-off-by: Markus Armbruster Reviewed-by: Alexander Bulekov > --- > Makefile | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/Makefile b/Makefile > index 8a9113e666..34275f57c9 100644 > --- a/Makefile > +++ b/Makefile > @@ -582,7 +582,6 @@ $(ROM_DIRS_RULES): > > .PHONY: recurse-all recurse-clean recurse-install > recurse-all: $(addsuffix /all, $(TARGET_DIRS) $(ROM_DIRS)) > -recurse-fuzz: $(addsuffix /fuzz, $(TARGET_DIRS) $(ROM_DIRS)) > recurse-clean: $(addsuffix /clean, $(TARGET_DIRS) $(ROM_DIRS)) > recurse-install: $(addsuffix /install, $(TARGET_DIRS)) > $(addsuffix /install, $(TARGET_DIRS)): all > -- > 2.21.1 >
[Bug 1874539] Re: tulip driver broken in v5.0.0-rc4
Attached trivial patch fixes it. ** Attachment added: "patch to fix tulip rx hangs" https://bugs.launchpad.net/qemu/+bug/1874539/+attachment/5359564/+files/p2 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1874539 Title: tulip driver broken in v5.0.0-rc4 Status in QEMU: New Bug description: In a qemu-system-hppa system, qemu release v5.0.0-rc, the tulip nic driver is broken. The tulip nic is detected, even getting DHCP info does work. But when trying to download bigger files via network, the tulip driver gets stuck. For example when trying to download a 100MB file: root@debian:~# wget https://speed.hetzner.de/100MB.bin --2020-04-23 20:26:43-- https://speed.hetzner.de/100MB.bin Resolving speed.hetzner.de (speed.hetzner.de)... 88.198.248.254, 2a01:4f8:0:59ed::2 Connecting to speed.hetzner.de (speed.hetzner.de)|88.198.248.254|:443... connected. When reverting this commit, everything works again: commit 8ffb7265af64ec81748335ec8f20e7ab542c3850 Author: Prasad J Pandit Date: Tue Mar 24 22:57:22 2020 +0530 PATCH: net: tulip: check frame size and r/w data length To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1874539/+subscriptions
Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting
>> Also, I do wonder if we want to default-enable it. It can still have a >> negative performance impact and some people might not want that. > > The negative performance impact should be minimal. At this point the > hinting process ends up being very light since it only processes a > small chunk of the memory before it shuts down for a couple seconds. > In my original data the only time any significant performance > regression was seen was with page shuffling enabled, and that was > before I put limits on how many pages we could process per pass and > how frequently we would make a pass through memory. My thought is that > we are much better having it enabled by default rather than having it > disabled by default. In the worst case scenario we have a little bit > of extra thread noise from it popping up running for a few > milliseconds and then sleeping for about two seconds, but the benefit > side is that the VMs will do us a favor and restrict themselves to a > much smaller idle footprint until such time as the memory is actually > needed in the guest. While I agree that the impact is small, there *is* a noticeable performance impact. One of the main users is memory overcommit. Also, imagine the following scenarios: - RT workload in the guest. Just because you have a virtio-balloon device defined would mean something is suddenly active and discarding/trying to discard pages. - vfio: free page reporting is completely useless right now and *only* overhead. - prealloc does not expect that pages get suddenly discarded - s390x, which has CMM and might not benefit at all (except when using huge pages as backing storage) No, I don't think it's acceptable to enable this as default. I remember that it is quite common to just define a balloon device but never use it. E.g., "A virtual memory balloon device is added to all Xen and KVM/QEMU guest virtual machines. It will be seen as element" [1]. I think we should let upper layers decide (just as we do for free page hinting, for example). [1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_administration_guide/section-libvirt-dom-xml-memory-baloon-device -- Thanks, David / dhildenb
Re: [RFC PATCH 0/3] hw/net/tulip: Fix LP#1874539
* Philippe Mathieu-Daudé : > Attempt to fix the launchpad bug filled by Helge: > > In a qemu-system-hppa system, qemu release v5.0.0-rc, > the tulip nic driver is broken. The tulip nic is detected, > even getting DHCP info does work. But when trying to > download bigger files via network, the tulip driver gets > stuck. > > Philippe Mathieu-Daudé (3): > hw/net/tulip: Fix 'Descriptor Error' definition > hw/net/tulip: Log descriptor overflows > hw/net/tulip: Set descriptor error bit when lenght is incorrect > > hw/net/tulip.h | 2 +- > hw/net/tulip.c | 32 > 2 files changed, 29 insertions(+), 5 deletions(-) Philippe, thanks for your efforts. Sadly your patch did not fixed the bug itself, but it had some nice cleanups which should be included at some point. Regarding the tulip hang reported by me, the patch below does fix the issue. [PATCH] Fix tulip rx hang Cc: Prasad J Pandit Fixes: 8ffb7265af ("check frame size and r/w data length") Buglink: https://bugs.launchpad.net/bugs/1874539 Signed-off-by: Helge Deller Commit 8ffb7265af ("check frame size and r/w data length") introduced checks to prevent accesses outside of the rx/tx buffers. But the new checks were plain wrong. rx_frame_len does count backwards, and the surrounding code ensures that rx_frame_len will not be bigger than rx_frame_size. Remove those checks again. diff --git a/hw/net/tulip.c b/hw/net/tulip.c index 1295f51d07..59d21defcc 100644 --- a/hw/net/tulip.c +++ b/hw/net/tulip.c @@ -171,9 +171,6 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct tulip_descriptor *desc) len = s->rx_frame_len; } -if (s->rx_frame_len + len > sizeof(s->rx_frame)) { -return; -} pci_dma_write(&s->dev, desc->buf_addr1, s->rx_frame + (s->rx_frame_size - s->rx_frame_len), len); s->rx_frame_len -= len; @@ -186,9 +183,6 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct tulip_descriptor *desc) len = s->rx_frame_len; } -if (s->rx_frame_len + len > sizeof(s->rx_frame)) { -return; -} pci_dma_write(&s->dev, desc->buf_addr2, s->rx_frame + (s->rx_frame_size - s->rx_frame_len), len); s->rx_frame_len -= len;
Re: [PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
24.04.2020 17:27, Kevin Wolf wrote: The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the image is possibly preallocated and then the zero flag is added to all clusters. This means that a copy-on-write operation may be needed when writing to these clusters, despite having used preallocation, negating one of the major benefits of preallocation. Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, and if the protocol driver can ensure that the new area reads as zeros, we can skip setting the zero flag in the qcow2 layer. Unfortunately, the same approach doesn't work for metadata preallocation, so we'll still set the zero flag there. Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses
> On Apr 22, 2020, at 4:15 PM, Ani Sinha wrote: > > > >> On Apr 21, 2020, at 8:32 PM, Daniel P. Berrangé wrote: >> >> On Tue, Apr 21, 2020 at 02:45:04PM +, Ani Sinha wrote: >>> >>> On Apr 20, 2020, at 8:32 PM, Michael S. Tsirkin wrote: But I for one would like to focus on keeping PIIX stable and focus development on q35. Not bloating PIIX with lots of new features is IMHO a good way to do that. >>> >>> Does this mean this patch is a no-go then? :( >> >> I'd support this patch, as I don't think it can really be described as >> bloat or destabalizing. It is just adding a simple property to >> conditionalize existing functionality. Telling people to switch to Q35 >> is unreasonable as it is not a simple 1-1 conversion from existing use >> of PIIX. Q35 has much higher complexity in its configuration, has higher >> memory overhead per VM too, and lacks certain features of PIIX too. > > Cool. How do we go forward from here? > We would really appreciate if we can add this extra knob in Qemu. Maybe someone else also in the community will find this useful. We don’t want to maintain this patch internally forever but rather prefer we maintain this as a Qemu community. ani
Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting
On Fri, Apr 24, 2020 at 4:20 AM David Hildenbrand wrote: > > On 22.04.20 20:21, Alexander Duyck wrote: > > From: Alexander Duyck > > > > Add support for free page reporting. The idea is to function very similar > > to how the balloon works in that we basically end up madvising the page as > > not being used. However we don't really need to bother with any deflate > > type logic since the page will be faulted back into the guest when it is > > read or written to. > > > > This provides a new way of letting the guest proactively report free > > pages to the hypervisor, so the hypervisor can reuse them. In contrast to > > inflate/deflate that is triggered via the hypervisor explicitly. > > > > Signed-off-by: Alexander Duyck > > --- > > hw/virtio/virtio-balloon.c | 70 > > > > include/hw/virtio/virtio-balloon.h |2 + > > 2 files changed, 71 insertions(+), 1 deletion(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index 5effc8b4653b..b473ff7f4b88 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > @@ -321,6 +321,60 @@ static void balloon_stats_set_poll_interval(Object > > *obj, Visitor *v, > > balloon_stats_change_timer(s, 0); > > } > > > > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq) > > +{ > > +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev); > > +VirtQueueElement *elem; > > + > > +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement { > > +unsigned int i; > > + > > Maybe add a comment like > > /* > * As discarded pages will be zero when re-accessed, all pages either > * have the old value, or were zeroed out. In case the guest expects > * another value, make sure to never discard. > */ > > Whatever you think is best. Okay I will add the following comment: /* * When we discard the page it has the effect of removing the page * from the hypervisor itself and causing it to be zeroed when it * is returned to us. So we must not discard the page if it is * accessible by another device or process, or if the guest is * expecting it to retain a non-zero value. */ > > +if (qemu_balloon_is_inhibited() || dev->poison_val) { > > +goto skip_element; > > +} > > + > > +for (i = 0; i < elem->in_num; i++) { > > +void *addr = elem->in_sg[i].iov_base; > > +size_t size = elem->in_sg[i].iov_len; > > +ram_addr_t ram_offset; > > +RAMBlock *rb; > > + > > +/* > > + * There is no need to check the memory section to see if > > + * it is ram/readonly/romd like there is for handle_output > > + * below. If the region is not meant to be written to then > > + * address_space_map will have allocated a bounce buffer > > + * and it will be freed in address_space_unmap and trigger > > + * and unassigned_mem_write before failing to copy over the > > + * buffer. If more than one bad descriptor is provided it > > + * will return NULL after the first bounce buffer and fail > > + * to map any resources. > > + */ > > +rb = qemu_ram_block_from_host(addr, false, &ram_offset); > > +if (!rb) { > > +trace_virtio_balloon_bad_addr(elem->in_addr[i]); > > +continue; > > +} > > + > > +/* > > + * For now we will simply ignore unaligned memory regions, or > > + * regions that overrun the end of the RAMBlock. > > + */ > > +if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) > > || > > +(ram_offset + size) > qemu_ram_get_used_length(rb)) { > > +continue; > > +} > > + > > +ram_block_discard_range(rb, ram_offset, size); > > +} > > + > > +skip_element: > > +virtqueue_push(vq, elem, 0); > > +virtio_notify(vdev, vq); > > +g_free(elem); > > +} > > +} > > + > > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) > > { > > VirtIOBalloon *s = VIRTIO_BALLOON(vdev); > > @@ -782,6 +836,16 @@ static void virtio_balloon_device_realize(DeviceState > > *dev, Error **errp) > > VirtIOBalloon *s = VIRTIO_BALLOON(dev); > > int ret; > > > > +/* > > + * Page reporting is dependant on page poison to make sure we can > > + * report a page without changing the state of the internal data. > > + * We need to set the flag before we call virtio_init as it will > > + * affect the config size of the vdev. > > + */ > > +if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) { > > +s->host_features |= 1 << VIRTIO_BALLOON_F_PAGE_POISON; > > +} > > + > > As discussed, this hunk would go away. With that, this patch i
[PATCH] travis: explicitly include gnutls to ensure it is updated
Travis includes gnutls in the default package set, but it is an outdated version linkng to an incompatible libffi version. The 'update: true' stanza causes the brew toolchain to be updated but not the installed formula. It is possible to run 'brew upgrade' to update installed formula, but this is very slow adding more than 5 minutes to the build time. Listing the gnutls package explicitly causes it to be updated without extending the build time. Signed-off-by: Daniel P. Berrangé --- Note in testing this I got past the libffi.6.dylib error, but eventually hit the Travis 50 minute timeout. So we need to do more to minimize what we build on macOS, splitting the job into two I guess. .travis.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.travis.yml b/.travis.yml index 2fd63eceaa..afbb070082 100644 --- a/.travis.yml +++ b/.travis.yml @@ -287,6 +287,7 @@ jobs: - pixman - gnu-sed - python +- gnutls update: true before_script: - brew link --overwrite python -- 2.24.1
Re: [PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'
On 24.04.20 16:56, Alexander Duyck wrote: > On Fri, Apr 24, 2020 at 4:23 AM David Hildenbrand wrote: >> >> On 22.04.20 20:21, Alexander Duyck wrote: >>> From: Alexander Duyck >>> >>> In an upcoming patch a feature named Free Page Reporting is about to be >>> added. In order to avoid any confusion we should drop the use of the word >>> 'report' when referring to Free Page Hinting. So what this patch does is go >>> through and replace all instances of 'report' with 'hint" when we are >>> referring to free page hinting. >>> >>> Signed-off-by: Alexander Duyck >>> --- >>> hw/virtio/virtio-balloon.c | 74 >>> ++-- >>> include/hw/virtio/virtio-balloon.h | 20 +- >>> 2 files changed, 47 insertions(+), 47 deletions(-) >>> >>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >>> index a4729f7fc930..a1d6fb52c876 100644 >>> --- a/hw/virtio/virtio-balloon.c >>> +++ b/hw/virtio/virtio-balloon.c >>> @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev) >>> ret = false; >>> goto out; >>> } >>> -if (id == dev->free_page_report_cmd_id) { >>> -dev->free_page_report_status = FREE_PAGE_REPORT_S_START; >>> +if (id == dev->free_page_hint_cmd_id) { >>> +dev->free_page_hint_status = FREE_PAGE_HINT_S_START; >>> } else { >>> /* >>> * Stop the optimization only when it has started. This >>> * avoids a stale stop sign for the previous command. >>> */ >>> -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { >>> -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP; >>> +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { >>> +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP; >>> } >>> } >>> } >>> >>> if (elem->in_num) { >>> -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { >>> +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { >>> qemu_guest_free_page_hint(elem->in_sg[0].iov_base, >>>elem->in_sg[0].iov_len); >>> } >>> @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void >>> *opaque) >>> qemu_mutex_unlock(&dev->free_page_lock); >>> virtio_notify(vdev, vq); >>>/* >>> - * Start to poll the vq once the reporting started. Otherwise, >>> continue >>> + * Start to poll the vq once the hinting started. Otherwise, continue >>> * only when there are entries on the vq, which need to be given >>> back. >>> */ >>> } while (continue_to_get_hints || >>> - dev->free_page_report_status == FREE_PAGE_REPORT_S_START); >>> + dev->free_page_hint_status == FREE_PAGE_HINT_S_START); >>> virtio_queue_set_notification(vq, 1); >>> } >>> >>> @@ -531,14 +531,14 @@ static void >>> virtio_balloon_free_page_start(VirtIOBalloon *s) >>> return; >>> } >>> >>> -if (s->free_page_report_cmd_id == UINT_MAX) { >>> -s->free_page_report_cmd_id = >>> - VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; >>> +if (s->free_page_hint_cmd_id == UINT_MAX) { >>> +s->free_page_hint_cmd_id = >>> + VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; >>> } else { >>> -s->free_page_report_cmd_id++; >>> +s->free_page_hint_cmd_id++; >>> } >>> >>> -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED; >>> +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED; >>> virtio_notify_config(vdev); >>> } >>> >>> @@ -546,18 +546,18 @@ static void >>> virtio_balloon_free_page_stop(VirtIOBalloon *s) >>> { >>> VirtIODevice *vdev = VIRTIO_DEVICE(s); >>> >>> -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { >>> +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) { >>> /* >>> * The lock also guarantees us that the >>> * virtio_ballloon_get_free_page_hints exits after the >>> - * free_page_report_status is set to S_STOP. >>> + * free_page_hint_status is set to S_STOP. >>> */ >>> qemu_mutex_lock(&s->free_page_lock); >>> /* >>> * The guest hasn't done the reporting, so host sends a >>> notification >>> * to the guest to actively stop the reporting. >>> */ >>> -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP; >>> +s->free_page_hint_status = FREE_PAGE_HINT_S_STOP; >>> qemu_mutex_unlock(&s->free_page_lock); >>> virtio_notify_config(vdev); >>> } >>> @@ -567,15 +567,15 @@ static void >>> virtio_balloon_free_page_done(VirtIOBalloon *s) >>> { >>> VirtIODevice *vdev = VIRTIO_DEVICE(s); >>> >>> -s->free_page_report_status = FREE_PAGE_REPORT_S_DONE; >>> +s->free_page_
[PATCH] target/riscv: fix check of guest pa top bits
The spec states that on sv39x4 guest physical "address bits 63:41 must all be zeros, or else a guest-page-fault exception occurs.". However, the check performed for these top bits of the virtual address on the second stage is the same as the one performed for virtual addresses on the first stage except with the 2-bit extension, effectively creating the same kind of "hole" in the guest's physical address space. I believe the following patch fixes this issue: Signed-off-by: Jose Martins --- target/riscv/cpu_helper.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index d3ba9efb02..da879f5656 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -421,15 +421,21 @@ static int get_physical_address(CPURISCVState *env, hwaddr *physical, int va_bits = PGSHIFT + levels * ptidxbits + widened; target_ulong mask, masked_msbs; -if (TARGET_LONG_BITS > (va_bits - 1)) { -mask = (1L << (TARGET_LONG_BITS - (va_bits - 1))) - 1; +if(!first_stage){ +if ((addr >> va_bits) != 0) { +return TRANSLATE_FAIL; +} } else { -mask = 0; -} -masked_msbs = (addr >> (va_bits - 1)) & mask; +if (TARGET_LONG_BITS > (va_bits - 1)) { +mask = (1L << (TARGET_LONG_BITS - (va_bits - 1))) - 1; +} else { +mask = 0; +} +masked_msbs = (addr >> (va_bits - 1)) & mask; -if (masked_msbs != 0 && masked_msbs != mask) { -return TRANSLATE_FAIL; +if (masked_msbs != 0 && masked_msbs != mask) { +return TRANSLATE_FAIL; +} } int ptshift = (levels - 1) * ptidxbits; -- 2.17.1 Jose
Re: [PATCH RESEND v6 10/36] multi-process: build system for remote device process
On Wed, Apr 22, 2020 at 09:13:45PM -0700, elena.ufimts...@oracle.com wrote: > From: Jagannathan Raman > > Modify Makefile to support the building of the remote > device process. Implements main() function of remote > device process. > > Signed-off-by: John G Johnson > Signed-off-by: Jagannathan Raman > Signed-off-by: Elena Ufimtseva > --- > MAINTAINERS | 8 ++ > Makefile| 2 ++ > Makefile.objs | 27 ++ > Makefile.target | 61 - > accel/Makefile.objs | 2 ++ > backends/Makefile.objs | 2 ++ > block/Makefile.objs | 2 ++ > hw/Makefile.objs| 7 + > hw/block/Makefile.objs | 2 ++ > hw/core/Makefile.objs | 18 > hw/nvram/Makefile.objs | 2 ++ > hw/pci/Makefile.objs| 4 +++ > hw/scsi/Makefile.objs | 2 ++ > migration/Makefile.objs | 2 ++ > qom/Makefile.objs | 3 ++ > remote/Makefile.objs| 1 + > remote/remote-main.c| 23 > stubs/replay.c | 4 +++ > 18 files changed, 171 insertions(+), 1 deletion(-) > create mode 100644 remote/Makefile.objs > create mode 100644 remote/remote-main.c This approach is okay for now but will result in a lot of Makefile duplication in the long run. Each hw .o file should specify its dependencies so that qemu-system-* and the remote executable can link in the needed files. The Kconfig system can also help with this by enabling/disabling features. Then the Makefiles don't need to duplicate *-obj-y and remote-pci-*. > diff --git a/Makefile.objs b/Makefile.objs > index f29c60c59d..f6654633b4 100644 > --- a/Makefile.objs > +++ b/Makefile.objs > @@ -21,6 +21,33 @@ block-obj-$(CONFIG_REPLICATION) += replication.o > > block-obj-m = block/ > > +# > +# remote-pci-obj-y is common code used by remote devices > + > +remote-pci-obj-$(CONFIG_MPQEMU) += hw/ > +remote-pci-obj-$(CONFIG_MPQEMU) += qom/ > +remote-pci-obj-$(CONFIG_MPQEMU) += backends/ > +remote-pci-obj-$(CONFIG_MPQEMU) += block/ > +remote-pci-obj-$(CONFIG_MPQEMU) += migration/ In the future migration can be split into the QEMU and remote parts. The remote executable doesn't need all the live migration code. > +remote-pci-obj-$(CONFIG_MPQEMU) += remote/ > +remote-pci-obj-$(CONFIG_MPQEMU) += accel/ Devices do not execute guest code so they should not need accel/. kvm and tcg functions were stubbed out earlier in this patch series, so I'm surprised to see thing being built into the remote executable. > @@ -121,6 +131,20 @@ LIBS := $(libs_cpu) $(LIBS) > > obj-$(CONFIG_PLUGIN) += plugins/ > > +ifeq ($(TARGET_NAME)-$(CONFIG_MPQEMU)-$(CONFIG_USER_ONLY), x86_64-y-) > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/kvm-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/tcg-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/hax-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/whpx-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/vl-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/net-stub.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/replay.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/xen-mapcache.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/audio.o > +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o > +endif Stubs don't need to be explicitly included, they should be linked in via libqemustub.a. > diff --git a/remote/remote-main.c b/remote/remote-main.c > new file mode 100644 > index 00..7c0764ad01 > --- /dev/null > +++ b/remote/remote-main.c > @@ -0,0 +1,23 @@ > +/* > + * Remote device initialization > + * > + * Copyright © 2018, 2020 Oracle and/or its affiliates. > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > + > +#include "qemu/osdep.h" > +#include "qemu-common.h" > + > +#include This is already included by "qemu/osdep.h" > + > +#include "qemu/module.h" > + > +int main(int argc, char *argv[]) > +{ > +module_call_init(MODULE_INIT_QOM); > + > +return 0; > +} > diff --git a/stubs/replay.c b/stubs/replay.c > index 2e3feee6a9..9b53c0cb37 100644 > --- a/stubs/replay.c > +++ b/stubs/replay.c > @@ -102,3 +102,7 @@ int replay_get_instructions(void) > void replay_account_executed_instructions(void) > { > } > + > +void replay_add_blocker(Error *reason) > +{ > +} This can be moved to the stubs patch. signature.asc Description: PGP signature
Re: [PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'
On Fri, Apr 24, 2020 at 4:23 AM David Hildenbrand wrote: > > On 22.04.20 20:21, Alexander Duyck wrote: > > From: Alexander Duyck > > > > In an upcoming patch a feature named Free Page Reporting is about to be > > added. In order to avoid any confusion we should drop the use of the word > > 'report' when referring to Free Page Hinting. So what this patch does is go > > through and replace all instances of 'report' with 'hint" when we are > > referring to free page hinting. > > > > Signed-off-by: Alexander Duyck > > --- > > hw/virtio/virtio-balloon.c | 74 > > ++-- > > include/hw/virtio/virtio-balloon.h | 20 +- > > 2 files changed, 47 insertions(+), 47 deletions(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index a4729f7fc930..a1d6fb52c876 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev) > > ret = false; > > goto out; > > } > > -if (id == dev->free_page_report_cmd_id) { > > -dev->free_page_report_status = FREE_PAGE_REPORT_S_START; > > +if (id == dev->free_page_hint_cmd_id) { > > +dev->free_page_hint_status = FREE_PAGE_HINT_S_START; > > } else { > > /* > > * Stop the optimization only when it has started. This > > * avoids a stale stop sign for the previous command. > > */ > > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { > > -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP; > > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { > > +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP; > > } > > } > > } > > > > if (elem->in_num) { > > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { > > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { > > qemu_guest_free_page_hint(elem->in_sg[0].iov_base, > >elem->in_sg[0].iov_len); > > } > > @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void > > *opaque) > > qemu_mutex_unlock(&dev->free_page_lock); > > virtio_notify(vdev, vq); > >/* > > - * Start to poll the vq once the reporting started. Otherwise, > > continue > > + * Start to poll the vq once the hinting started. Otherwise, continue > > * only when there are entries on the vq, which need to be given > > back. > > */ > > } while (continue_to_get_hints || > > - dev->free_page_report_status == FREE_PAGE_REPORT_S_START); > > + dev->free_page_hint_status == FREE_PAGE_HINT_S_START); > > virtio_queue_set_notification(vq, 1); > > } > > > > @@ -531,14 +531,14 @@ static void > > virtio_balloon_free_page_start(VirtIOBalloon *s) > > return; > > } > > > > -if (s->free_page_report_cmd_id == UINT_MAX) { > > -s->free_page_report_cmd_id = > > - VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; > > +if (s->free_page_hint_cmd_id == UINT_MAX) { > > +s->free_page_hint_cmd_id = > > + VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; > > } else { > > -s->free_page_report_cmd_id++; > > +s->free_page_hint_cmd_id++; > > } > > > > -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED; > > +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED; > > virtio_notify_config(vdev); > > } > > > > @@ -546,18 +546,18 @@ static void > > virtio_balloon_free_page_stop(VirtIOBalloon *s) > > { > > VirtIODevice *vdev = VIRTIO_DEVICE(s); > > > > -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { > > +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) { > > /* > > * The lock also guarantees us that the > > * virtio_ballloon_get_free_page_hints exits after the > > - * free_page_report_status is set to S_STOP. > > + * free_page_hint_status is set to S_STOP. > > */ > > qemu_mutex_lock(&s->free_page_lock); > > /* > > * The guest hasn't done the reporting, so host sends a > > notification > > * to the guest to actively stop the reporting. > > */ > > -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP; > > +s->free_page_hint_status = FREE_PAGE_HINT_S_STOP; > > qemu_mutex_unlock(&s->free_page_lock); > > virtio_notify_config(vdev); > > } > > @@ -567,15 +567,15 @@ static void > > virtio_balloon_free_page_done(VirtIOBalloon *s) > > { > > VirtIODevice *vdev = VIRTIO_DEVICE(s); > > > > -s->free_page_report_status = FREE_PAGE_REPORT_S_DONE; > > +s->free_page_hint_status = FREE_PAGE_HINT_S_DONE; > >
Re: [PATCH v7 09/10] iotests: Test committing to short backing file
24.04.2020 15:54, Kevin Wolf wrote: Signed-off-by: Kevin Wolf Reviewed-by: Vladimir Sementsov-Ogievskiy --- [..] + +# After this, 0 to base_size should be allocated/zeroed. Actually, top_size_old to base_size, yes? (sorry, nitpicking, missed on previous review). +# +# In theory, leaving base_size to top_size_new unallocated would be +# correct, but in practice, if we zero out anything, we zero out +# everything up to top_size_new. +iotests.qemu_img_log('resize', '-f', iotests.imgfmt, + '--preallocation', prealloc, top, top_size_new) +iotests.qemu_io_log('-c', 'read -P 0 %s 64k' % off, top) +iotests.qemu_io_log('-c', 'map', top) +iotests.qemu_img_log('map', '--output=json', top) -- Best regards, Vladimir
Re: [PATCH 4/5] ramfb: add sanity checks to ramfb_create_display_surface
On 04/23/20 13:41, Gerd Hoffmann wrote: > Hi, > >> - if "linesize" is nonzero, make sure it is a whole multiple of the >> required word size (?) >> >> - if "linesize" is nonzero, make sure it is not bogus with relation to >> "width". I'm thinking something like: > > Yep, the whole linesize is a bit bogus, noticed after sending out this > series, I have a followup patch for that (see below). > > take care, > Gerd > > From 154dcff73dc533fc95c88bd960ed65108af6c734 Mon Sep 17 00:00:00 2001 > From: Gerd Hoffmann > Date: Wed, 22 Apr 2020 12:07:59 +0200 > Subject: [PATCH] ramfb: fix size calculation > > size calculation isn't correct with guest-supplied stride, the last > display line isn't accounted for correctly. > > For the typical case of stride > linesize (add padding) we error on the > safe side (calculated size is larger than actual size). > > With stride < linesize (scanlines overlap) the calculated size is > smaller than the actual size though so our guest memory mapping might > end up being too small. > > While being at it also fix ramfb_create_display_surface to use hwaddr > for the parameters. That way all calculation are done with hwaddr type > and we can't get funny effects from type castings. > > Signed-off-by: Gerd Hoffmann > --- > hw/display/ramfb.c | 19 ++- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c > index be884c9ea837..928d74d10bc7 100644 > --- a/hw/display/ramfb.c > +++ b/hw/display/ramfb.c > @@ -44,10 +44,10 @@ static void ramfb_unmap_display_surface(pixman_image_t > *image, void *unused) > > static DisplaySurface *ramfb_create_display_surface(int width, int height, > pixman_format_code_t > format, > -int linesize, uint64_t > addr) > +hwaddr stride, hwaddr > addr) > { > DisplaySurface *surface; > -hwaddr size; > +hwaddr size, mapsize, linesize; > void *data; > > if (width < 16 || width > VBE_DISPI_MAX_XRES || > @@ -55,19 +55,20 @@ static DisplaySurface *ramfb_create_display_surface(int > width, int height, > format == 0 /* unknown format */) > return NULL; > > -if (linesize == 0) { > -linesize = width * PIXMAN_FORMAT_BPP(format) / 8; > +linesize = width * PIXMAN_FORMAT_BPP(format) / 8; > +if (stride == 0) { > +stride = linesize; > } > > -size = (hwaddr)linesize * height; > -data = cpu_physical_memory_map(addr, &size, false); > -if (size != (hwaddr)linesize * height) { > -cpu_physical_memory_unmap(data, size, 0, 0); > +mapsize = size = stride * (height - 1) + linesize; > +data = cpu_physical_memory_map(addr, &mapsize, false); > +if (size != mapsize) { > +cpu_physical_memory_unmap(data, mapsize, 0, 0); > return NULL; > } > > surface = qemu_create_displaysurface_from(width, height, > - format, linesize, data); > + format, stride, data); > pixman_image_set_destroy_function(surface->image, >ramfb_unmap_display_surface, NULL); > > I don't understand two things about this patch: - "stride" can still be smaller than "linesize" (scanlines can still overlap). Why is that not a problem? - assuming "stride > linesize" (i.e., strictly larger), we don't seem to map the last complete stride, and that seems to be intentional. Is that safe with regard to qemu_create_displaysurface_from()? Ultimately the stride is passed to pixman_image_create_bits(), and the underlying "data" may not be large enough for that. What am I missing? Hm... bonus question: qemu_create_displaysurface_from() still accepts "linesize" as a signed int. (Not sure about pixman_image_create_bits().) Should we do something specific to prevent that value from being negative? The guest gives us a uint32_t. Thanks Laszlo
Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
On 4/24/20 7:54 AM, Kevin Wolf wrote: If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf --- block/qcow2-cluster.c | 2 +- block/qcow2.c | 34 ++ 2 files changed, 35 insertions(+), 1 deletion(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
On 24.04.20 16:27, Kevin Wolf wrote: > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the > image is possibly preallocated and then the zero flag is added to all > clusters. This means that a copy-on-write operation may be needed when > writing to these clusters, despite having used preallocation, negating > one of the major benefits of preallocation. > > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, > and if the protocol driver can ensure that the new area reads as zeros, > we can skip setting the zero flag in the qcow2 layer. > > Unfortunately, the same approach doesn't work for metadata > preallocation, so we'll still set the zero flag there. > > Signed-off-by: Kevin Wolf > Reviewed-by: Max Reitz > --- > block/qcow2.c | 22 +++--- > tests/qemu-iotests/274.out | 4 ++-- > 2 files changed, 21 insertions(+), 5 deletions(-) Thanks for the resend! :) Max signature.asc Description: OpenPGP digital signature
Re: [PATCH] iotests/041: Fix NBD socket path
On 4/24/20 8:46 AM, Max Reitz wrote: We should put all UNIX socket files into the sock_dir, not test_dir. Reported-by: Elena Ufimtseva Signed-off-by: Max Reitz --- tests/qemu-iotests/041 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Eric Blake I'm happy to queue this through my NBD tree, if you don't beat me to it through an iotest pull request. diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041 index 5d67bf14bf..46bf1f6c81 100755 --- a/tests/qemu-iotests/041 +++ b/tests/qemu-iotests/041 @@ -35,7 +35,7 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img') quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img') quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img') -nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock') +nbd_sock_path = os.path.join(iotests.sock_dir, 'nbd.sock') class TestSingleDrive(iotests.QMPTestCase): image_len = 1 * 1024 * 1024 # MB -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
24.04.2020 15:17, Kevin Wolf wrote: Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben: 23.04.2020 18:01, Kevin Wolf wrote: If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz --- block/qcow2-cluster.c | 2 +- block/qcow2.c | 33 + 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 17f1363279..4b5fc8c4a7 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset, /* Caller must pass aligned values, except at image end */ assert(QEMU_IS_ALIGNED(offset, s->cluster_size)); assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) || - end_offset == bs->total_sectors << BDRV_SECTOR_BITS); + end_offset >= bs->total_sectors << BDRV_SECTOR_BITS); /* The zero flag is only supported by version 3 and newer */ if (s->qcow_version < 3) { diff --git a/block/qcow2.c b/block/qcow2.c index 9cfbdfc939..ad621fe404 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, bs->supported_zero_flags = header.version >= 3 ? BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0; +bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE; /* Repair image if dirty */ if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only && @@ -4214,6 +4215,38 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, g_assert_not_reached(); } +if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) { +uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size); + +/* + * Use zero clusters as much as we can. qcow2_cluster_zeroize() + * requires a cluster-aligned start. The end may be unaligned if it is + * at the end of the image (which it is here). + */ +ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0); +if (ret < 0) { +error_setg_errno(errp, -ret, "Failed to zero out new clusters"); +goto fail; +} + +/* Write explicit zeros for the unaligned head */ +if (zero_start > old_length) { +uint8_t *buf = qemu_blockalign0(bs, s->cluster_size); Why not s/s->cluster_size/zero_start - old_length/? We may save a lot of pages if cluster_size is large. Ok. +QEMUIOVector qiov; +qemu_iovec_init_buf(&qiov, buf, zero_start - old_length); + +qemu_co_mutex_unlock(&s->lock); We are in intermediate state here. Is it safe to unlock? Anything may happen, up to another truncate.. I don't think it's worse than unlocking during normal writes, which we have been doing for a long time. I don't see anything that would corrupt any internal state. Of course, doing truncate in parallel with other operations around EOF has somewhat undefined semantics because the order could be anything. But if you don't want to get undefined results, you just shouldn't make such requests. It's similar to reading and writing the same area at the same time. If there would be two truncate operations in parallel, we may finish up with s->l1_vm_state_index bs->total_sectors not corresponding to other metadata. Not sure how much is it bad.. +ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0); Better not do it if this cluster is already ZERO.. But it may be a later patch to improve that. I doubt it's worth writing code to optimise a corner case inside a corner case. +qemu_co_mutex_lock(&s->lock); + +qemu_vfree(buf); +if (ret < 0) { +error_setg_errno(errp, -ret, "Failed to zero out the new area"); +goto fail; +} +} +} + if (prealloc != PREALLOC_MODE_OFF) { /* Flush metadata before actually changing the image size */ ret = qcow2_write_caches(bs); Kevin -- Best regards, Vladimir
[PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the image is possibly preallocated and then the zero flag is added to all clusters. This means that a copy-on-write operation may be needed when writing to these clusters, despite having used preallocation, negating one of the major benefits of preallocation. Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, and if the protocol driver can ensure that the new area reads as zeros, we can skip setting the zero flag in the qcow2 layer. Unfortunately, the same approach doesn't work for metadata preallocation, so we'll still set the zero flag there. Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz --- block/qcow2.c | 22 +++--- tests/qemu-iotests/274.out | 4 ++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 98065d7808..2ba0b17c39 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -4170,9 +4170,25 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, /* Allocate the data area */ new_file_size = allocation_start + nb_new_data_clusters * s->cluster_size; -/* Image file grows, so @exact does not matter */ -ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0, - errp); +/* + * Image file grows, so @exact does not matter. + * + * If we need to zero out the new area, try first whether the protocol + * driver can already take care of this. + */ +if (flags & BDRV_REQ_ZERO_WRITE) { +ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, + BDRV_REQ_ZERO_WRITE, NULL); +if (ret >= 0) { +flags &= ~BDRV_REQ_ZERO_WRITE; +} +} else { +ret = -1; +} +if (ret < 0) { +ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0, + errp); +} if (ret < 0) { error_prepend(errp, "Failed to resize underlying file: "); qcow2_free_clusters(bs, allocation_start, diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out index 1a796fd07c..9d6fdeb1f7 100644 --- a/tests/qemu-iotests/274.out +++ b/tests/qemu-iotests/274.out @@ -187,7 +187,7 @@ read 65536/65536 bytes at offset 9437184 10 MiB (0xa0) bytes allocated at offset 5 MiB (0x50) [{ "start": 0, "length": 5242880, "depth": 1, "zero": true, "data": false}, -{ "start": 5242880, "length": 10485760, "depth": 0, "zero": true, "data": false, "offset": 327680}] +{ "start": 5242880, "length": 10485760, "depth": 0, "zero": false, "data": true, "offset": 327680}] === preallocation=full === Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=16777216 cluster_size=65536 lazy_refcounts=off refcount_bits=16 @@ -206,7 +206,7 @@ read 65536/65536 bytes at offset 11534336 4 MiB (0x40) bytes allocated at offset 8 MiB (0x80) [{ "start": 0, "length": 8388608, "depth": 1, "zero": true, "data": false}, -{ "start": 8388608, "length": 4194304, "depth": 0, "zero": true, "data": false, "offset": 327680}] +{ "start": 8388608, "length": 4194304, "depth": 0, "zero": false, "data": true, "offset": 327680}] === preallocation=off === Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=393216 cluster_size=65536 lazy_refcounts=off refcount_bits=16 -- 2.25.3
Re: [RFC PATCH 3/3] hw/net/tulip: Set descriptor error bit when lenght is incorrect
On 4/23/20 6:16 PM, Philippe Mathieu-Daudé wrote: When a frame lenght is incorrect, set the RDES0 'Error Summary' length (here, and in the subject) and 'Frame too long' bits. Then stop the receive process and trigger an abnormal interrupt. See [4.3.5 Receive Process]. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v2 1/3] block: Add blk_new_with_bs() helper
On 24.04.20 16:18, Eric Blake wrote: > On 4/24/20 5:02 AM, Max Reitz wrote: > >>> (With the Patchew warning fixed, of course (i.e., we should set ret to >>> -EPERM or something in qcow.c)) >> >> Er, well, maybe I should have looked into more places. The compiler >> only warns about that single one because it’s the only place where @ret >> is really uninitialized, but there are many more where we need to set >> it: crypto.c, parallels.c, qcow.c, qcow2.c (both hunks), qed.c, >> sheepdog.c, vdi.c, vhdx.c, and vpc.c. >> >> (So basically everywhere but vmdk.c, blockdev.c, and blockjob.c.) > > Urgh - so it's even less of a win in terms of reduced lines of code. True, but I still consider it a win in terms of reduced complexity. Before, we have two function calls with two variable assignments from their return values (@blk and @ret). Afterwards, we’ll have a single function all with one variable assignment from its return value (@blk), and one stand-alone variable assignment with a constant value (“ret = -EPERM” or similar). :) Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 1/3] block: Add blk_new_with_bs() helper
On 4/24/20 5:02 AM, Max Reitz wrote: (With the Patchew warning fixed, of course (i.e., we should set ret to -EPERM or something in qcow.c)) Er, well, maybe I should have looked into more places. The compiler only warns about that single one because it’s the only place where @ret is really uninitialized, but there are many more where we need to set it: crypto.c, parallels.c, qcow.c, qcow2.c (both hunks), qed.c, sheepdog.c, vdi.c, vhdx.c, and vpc.c. (So basically everywhere but vmdk.c, blockdev.c, and blockjob.c.) Urgh - so it's even less of a win in terms of reduced lines of code. Still, I'll fix it and post v3. And now I’m going to get another coffee... Max -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [EXT] Re: [PATCH v9 1/9] hw/vfio/common: Remove error print on mmio region translation by viommu
Hi Bharat, On 4/2/20 11:01 AM, Bharat Bhushan wrote: > Hi Eric/Alex, > >> -Original Message- >> From: Alex Williamson >> Sent: Thursday, March 26, 2020 11:23 PM >> To: Auger Eric >> Cc: Bharat Bhushan ; peter.mayd...@linaro.org; >> pet...@redhat.com; eric.auger@gmail.com; kevin.t...@intel.com; >> m...@redhat.com; Tomasz Nowicki [C] ; >> drjo...@redhat.com; linuc.dec...@gmail.com; qemu-devel@nongnu.org; qemu- >> a...@nongnu.org; bharatb.li...@gmail.com; jean-phili...@linaro.org; >> yang.zh...@intel.com; David Gibson >> Subject: [EXT] Re: [PATCH v9 1/9] hw/vfio/common: Remove error print on mmio >> region translation by viommu >> >> External Email >> >> -- >> On Thu, 26 Mar 2020 18:35:48 +0100 >> Auger Eric wrote: >> >>> Hi Alex, >>> >>> On 3/24/20 12:08 AM, Alex Williamson wrote: [Cc +dwg who originated this warning] On Mon, 23 Mar 2020 14:16:09 +0530 Bharat Bhushan wrote: > On ARM, the MSI doorbell is translated by the virtual IOMMU. > As such address_space_translate() returns the MSI controller MMIO > region and we get an "iommu map to non memory area" > message. Let's remove this latter. > > Signed-off-by: Eric Auger > Signed-off-by: Bharat Bhushan > --- > hw/vfio/common.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c index > 5ca11488d6..c586edf47a 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -426,8 +426,6 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, >> void **vaddr, > &xlat, &len, writable, > MEMTXATTRS_UNSPECIFIED); > if (!memory_region_is_ram(mr)) { > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > - xlat); > return false; > } > I'm a bit confused here, I think we need more justification beyond "we hit this warning and we don't want to because it's ok in this one special case, therefore remove it". I assume the special case is that the device MSI address is managed via the SET_IRQS ioctl and therefore we won't actually get DMAs to this range. >>> Yes exactly. The guest creates a mapping between one giova and this >>> gpa (corresponding to the MSI controller doorbell) because MSIs are >>> mapped on ARM. But practically the physical device is programmed with >>> an host chosen iova that maps onto the physical MSI controller's >>> doorbell. so the device never performs DMA accesses to this range. >>> >>> But I imagine the case that was in mind when adding this warning was general peer-to-peer between and assigned and emulated device. >>> yes makes sense. >>> >>> Maybe there's an argument to be made that such a p2p mapping might also be used in a non-vIOMMU case. We skip creating those mappings and drivers continue to work, maybe because nobody attempts to do p2p DMA with the types of devices we emulate, maybe because p2p DMA is not absolutely reliable on bare metal and drivers test it before using it. >>> MSI doorbells are mapped using the IOMMU_MMIO flag (dma-iommu.c >>> iommu_dma_get_msi_page). >>> One idea could be to pass that flag through the IOMMU Notifier >>> mechanism into the iotlb->perm. Eventually when we get this in >>> vfio_get_vaddr() we would not print the warning. Could that make sense? >> >> Yeah, if we can identify a valid case that doesn't need a warning, that's >> fine by me. >> Thanks, > > Let me know if I understood the proposal correctly: > > virtio-iommu driver in guest will make map (VIRTIO_IOMMU_T_MAP) with > VIRTIO_IOMMU_MAP_F_MMIO flag for MSI mapping. > In qemu, virtio-iommu device will set a new defined flag (say IOMMU_MMIO) in > iotlb->perm in memory_region_notify_iommu(). vfio_get_vaddr() will check same > flag and will not print the warning.> > Is above correct? Yes that's what I had in mind. Thanks Eric > > Thanks > -Bharat > >> >> Alex > >
Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
On 4/24/20 7:17 AM, Kevin Wolf wrote: +ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0); Better not do it if this cluster is already ZERO.. But it may be a later patch to improve that. I doubt it's worth writing code to optimise a corner case inside a corner case. I spotted the same issue, and my suggestion was to use qcow2_co_pwrite_zero instead of qcow2_co_pwritev_part, but Kevin pointed out that my idea would probably not work the way I thought. Then again, checking if the page is already zero is all the more benefit that qcow2_co_pwrite_zero would have given for me to have raised the question. The following example illustrates why it might be worthwhile: Suppose we have an image with 1M clusters, which is an unaligned 100.5M in length but started life with all clusters sparse. We then resize it to another unaligned 200.5M. The call to qcow2_co_pwritev_part will cause the image to be non-sparse for one cluster out of 201, whereas adding a check to skip the pwritev because the original unaligned tail cluster already read as zero will let us keep all clusters sparse. At any rate, I argued that any such optimization would be a followup patch, and Kevin is right that it is a corner case optimization unlikely to affect many real-life images. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v7 00/10] block: Fix resize (extending) of short overlays
On 24.04.20 14:54, Kevin Wolf wrote: > v7: > - Allocate smaller zero buffer [Vladimir] > - Added missing error_setg_errno() [Max] > - Code cleanup in the iotest, enabled mapping for 'metadata' [Vladimir] > - Don't assign to errp twice [Eric] I would have liked to see that change in patch 10, but the mail seems to have disappeared somewhere. :? Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 09/10] iotests: Test committing to short backing file
On 24.04.20 14:54, Kevin Wolf wrote: > Signed-off-by: Kevin Wolf > --- > tests/qemu-iotests/274 | 155 + > tests/qemu-iotests/274.out | 268 + > tests/qemu-iotests/group | 1 + > 3 files changed, 424 insertions(+) > create mode 100755 tests/qemu-iotests/274 > create mode 100644 tests/qemu-iotests/274.out Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 07/10] block: truncate: Don't make backing file data visible
On 24.04.20 14:54, Kevin Wolf wrote: > When extending the size of an image that has a backing file larger than > its old size, make sure that the backing file data doesn't become > visible in the guest, but the added area is properly zeroed out. > > Consider the following scenario where the overlay is shorter than its > backing file: > > base.qcow2: > overlay.qcow2: > > When resizing (extending) overlay.qcow2, the new blocks should not stay > unallocated and make the additional As from base.qcow2 visible like > before this patch, but zeros should be read. > > A similar case happens with the various variants of a commit job when an > intermediate file is short (- for unallocated): > > base.qcow2: A-A- > mid.qcow2: BB-B > top.qcow2: C--C--C- > > After commit top.qcow2 to mid.qcow2, the following happens: > > mid.qcow2: CB-C00C0 (correct result) > mid.qcow2: CB-C--C- (before this fix) > > Without the fix, blocks that previously read as zeros on top.qcow2 > suddenly turn into A. > > Signed-off-by: Kevin Wolf > Reviewed-by: Vladimir Sementsov-Ogievskiy > --- > block/io.c | 25 + > 1 file changed, 25 insertions(+) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
On 24.04.20 14:54, Kevin Wolf wrote: > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't > undo any previous preallocation, but just adds the zero flag to all > relevant L2 entries. If an external data file is in use, a write_zeroes > request to the data file is made instead. > > Signed-off-by: Kevin Wolf > --- > block/qcow2-cluster.c | 2 +- > block/qcow2.c | 34 ++ > 2 files changed, 35 insertions(+), 1 deletion(-) [...] > diff --git a/block/qcow2.c b/block/qcow2.c > index 9cfbdfc939..98065d7808 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c [...] > @@ -4214,6 +4215,39 @@ static int coroutine_fn > qcow2_co_truncate(BlockDriverState *bs, int64_t offset, [...] > +/* Write explicit zeros for the unaligned head */ > +if (zero_start > old_length) { > +uint64_t len = zero_start - old_length; > +uint8_t *buf = qemu_blockalign0(bs, len); I wonder whether I should raise the question of why this should be block-aligned when we make no effort to align the offset its written to (and we know it isn’t aligned to qcow2 clusters at least). I probably should not. Reviewed-by: Max Reitz > +QEMUIOVector qiov; > +qemu_iovec_init_buf(&qiov, buf, len); > + > +qemu_co_mutex_unlock(&s->lock); > +ret = qcow2_co_pwritev_part(bs, old_length, len, &qiov, 0, 0); > +qemu_co_mutex_lock(&s->lock); > + > +qemu_vfree(buf); > +if (ret < 0) { > +error_setg_errno(errp, -ret, "Failed to zero out the new > area"); > +goto fail; > +} > +} signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 00/15] qapi: Spring cleaning
On 4/24/20 5:06 AM, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20200424084338.26803-1-arm...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === -... +E.. +== +ERROR: test_stream_pause (__main__.TestSingleDrive) +-- +Traceback (most recent call last): + File "030", line 93, in test_stream_pause Not sure how this would be related to your series, or if it is a spurious flakiness in the test that just chose to hit now. iotest 30 is passing fine for me even with your series applied. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH v2 15/15] qapi: Generate simpler marshalling code when no arguments
On 4/24/20 3:43 AM, Markus Armbruster wrote: When command FOO has no arguments, its generated qmp_marshal_FOO() is a bit confusing. Make it simpler: visit_start_struct(v, NULL, NULL, 0, &err); if (err) { goto out; } - -if (!err) { -visit_check_struct(v, &err); -} +visit_check_struct(v, &err); visit_end_struct(v, NULL); if (err) { goto out; } Signed-off-by: Markus Armbruster --- scripts/qapi/commands.py | 40 1 file changed, 24 insertions(+), 16 deletions(-) A bit more complex in the generator, but the generated code is indeed better. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org