On Tue, May 19, 2026 at 12:48 PM Philippe Mathieu-Daudé <[email protected]> wrote: > > On 19/5/26 18:22, James Hilliard wrote: > > ZCB zeros the 128-byte cache block containing the base address. ZCBT has > > the same user-mode-visible memory effect for QEMU purposes. > > > > Model both forms with a single decodetree wildcard entry, align the > > address down to a 128-byte line, and store eight zero 128-bit chunks to > > guest memory. > > > > Acked-by: Richard Henderson <[email protected]> > > Signed-off-by: James Hilliard <[email protected]> > > --- > > Changes v8 -> v9: > > - Use MO_ATOM_NONE for the 128-bit zero stores so TCG does not > > require unavailable 128-bit atomic stores on hosts that lack them. > > > > Changes v7 -> v8: > > - Fold the ZCBT wildcard decode into the ZCB patch so the series does not > > add a ZCB-only decode and rewrite it in the next patch. > > > > Changes v6 -> v7: > > - Use 128-bit zero stores with MO_128 instead of sixteen 64-bit stores. > > (suggested by Philippe Mathieu-Daudé) > > - Fold ZCB and ZCBT into a single decodetree wildcard entry instead of > > using a duplicate entry with a selector comment. (suggested by > > Philippe > > Mathieu-Daudé) > > > > Changes v2 -> v3: > > - Split ZCB/ZCBT out of the combined Octeon arithmetic and memory > > instruction patch. (requested by Richard Henderson) > > --- > > target/mips/tcg/octeon.decode | 3 +++ > > target/mips/tcg/octeon_translate.c | 27 +++++++++++++++++++++++++++ > > 2 files changed, 30 insertions(+) > > > > diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode > > index d77717cd50..01ed3b50be 100644 > > --- a/target/mips/tcg/octeon.decode > > +++ b/target/mips/tcg/octeon.decode > > @@ -49,6 +49,9 @@ SNEI 011100 rs:5 rt:5 imm:s10 101111 &cmpi > > SAA 011100 ..... ..... 00000 00000 011000 @saa > > SAAD 011100 ..... ..... 00000 00000 011001 @saa > > > > +&zcb base > > +ZCB 011100 base:5 00000 00000 1110- 011111 &zcb > > + > > &lx base index rd > > @lx ...... base:5 index:5 rd:5 ...... ..... &lx > > LWX 011111 ..... ..... ..... 00000 001010 @lx > > diff --git a/target/mips/tcg/octeon_translate.c > > b/target/mips/tcg/octeon_translate.c > > index d3dfef2e0c..721a9a8d9d 100644 > > --- a/target/mips/tcg/octeon_translate.c > > +++ b/target/mips/tcg/octeon_translate.c > > @@ -176,6 +176,33 @@ static bool trans_saa(DisasContext *ctx, arg_saa *a, > > MemOp mop) > > > > TRANS(SAA, trans_saa, MO_32); > > TRANS(SAAD, trans_saa, MO_64); > > + > > +static bool trans_ZCB(DisasContext *ctx, arg_ZCB *a) > > +{ > > + TCGv_i64 addr = tcg_temp_new_i64(); > > + TCGv_i64 line = tcg_temp_new_i64(); > > + TCGv_i64 zero64 = tcg_constant_i64(0); > > + TCGv_i128 zero128 = tcg_temp_new_i128(); > > const MemOp mop = mo_endian(ctx) | MO_128 | MO_ATOM_NONE; > > Although $zero endianness is irrelevant :) but I prefer to keep > it explicit for coding style. > > I can squash upon applying if you agree or keep your patch as it.
Whichever you prefer is fine with me. > > Reviewed-by: Philippe Mathieu-Daudé <[email protected]> > > > + gen_base_offset_addr(ctx, addr, a->base, 0); > > + tcg_gen_concat_i64_i128(zero128, zero64, zero64); > > + > > + /* > > + * QEMU models ZCB/ZCBT as zeroing the containing 128-byte cache line > > + * in guest memory. > > + */ > > + tcg_gen_andi_i64(line, addr, ~0x7fULL); > > + > > + for (int i = 0; i < 8; i++) { > > + TCGv_i64 slot = tcg_temp_new_i64(); > > + > > + tcg_gen_addi_i64(slot, line, i * 16); > > + tcg_gen_qemu_st_i128(zero128, slot, ctx->mem_idx, > > + mo_endian(ctx) | MO_128 | MO_ATOM_NONE); > > + }> + > > + return true; > > +} > > TRANS(LBX, trans_lx, MO_SB); > > TRANS(LBUX, trans_lx, MO_UB); > > TRANS(LHX, trans_lx, MO_SW); > > >
