VMULU multiplies the active Octeon multiplier state by rs, adds rt and queued partial products, returns the low result, and advances P[0]/P[1] with carry limbs.
Expand the two-limb accumulator operation inline with TCG so the result and partial-product state stay visible to the optimizer. Signed-off-by: James Hilliard <[email protected]> Signed-off-by: Richard Henderson <[email protected]> --- Changes v2 -> v3: - Split VMULU out of the combined Octeon arithmetic and memory instruction patch. (requested by Richard Henderson) Changes v5 -> v6: - Rename the translator helper callback typedef for clarity. Changes v7 -> v8: - Use Richard Henderson's v7.5 inline TCG translator with tcg_gen_addN_i64. --- target/mips/tcg/octeon.decode | 2 ++ target/mips/tcg/octeon_translate.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode index bb0a9f1d99..36ced0bb33 100644 --- a/target/mips/tcg/octeon.decode +++ b/target/mips/tcg/octeon.decode @@ -55,6 +55,8 @@ MTP0 011100 ..... ..... 00000 00000 001001 @r2 MTP1 011100 ..... ..... 00000 00000 001010 @r2 MTP2 011100 ..... ..... 00000 00000 001011 @r2 +VMULU 011100 ..... ..... ..... 00000 001111 @r3 + &saa base rt @saa ...... base:5 rt:5 ................ &saa SAA 011100 ..... ..... 00000 00000 011000 @saa diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c index f778ca22f0..3216f2b2eb 100644 --- a/target/mips/tcg/octeon_translate.c +++ b/target/mips/tcg/octeon_translate.c @@ -265,3 +265,38 @@ static bool trans_mtp(DisasContext *ctx, arg_r2 *a, unsigned int index) TRANS(MTP0, trans_mtp, 0); TRANS(MTP1, trans_mtp, 1); TRANS(MTP2, trans_mtp, 2); + +static bool trans_VMULU(DisasContext *ctx, arg_VMULU *a) +{ + TCGv_i64 x[3], y[3], z[3]; + TCGv_i64 tmp = tcg_temp_new_i64(); + TCGv_i64 zero = tcg_constant_i64(0); + + z[0] = y[0] = tcg_temp_new_i64(); + z[1] = y[1] = tcg_temp_new_i64(); + z[2] = y[2] = tcg_temp_new_i64(); + x[0] = tcg_temp_new_i64(); + x[1] = tcg_temp_new_i64(); + x[2] = zero; + + /* Z = rs * (mpl1 : mpl0) + rt */ + gen_load_gpr(tmp, a->rs); + gen_load_gpr(y[0], a->rt); + tcg_gen_mulu2_i64(x[0], x[1], tmp, oct_mpl[0]); + tcg_gen_mulu2_i64(y[1], y[2], tmp, oct_mpl[1]); + tcg_gen_addN_i64(3, z, y, x); + + /* X == (0 : p1 : p0) */ + x[0] = oct_p[0]; + x[1] = oct_p[1]; + + /* Y == (p1 : p0 : tmp) */ + y[0] = tmp; + y[1] = oct_p[0]; + y[2] = oct_p[1]; + + /* (p1 : p0 : rd) = Z + (0 : p1 : p0) */ + tcg_gen_addN_i64(3, y, z, x); + gen_store_gpr(tmp, a->rd); + return true; +} -- 2.54.0
