On Fri, Sep 21, 2012 at 10:13:40AM -0700, Richard Henderson wrote: > When movcond_i32 is available we can further reduce the generated > op count from 12 to 6, and the generated code size on i686 from > 88 to 74 bytes. > > Signed-off-by: Richard Henderson <r...@twiddle.net> > --- > tcg/tcg-op.h | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h > index 3e375ea..0145a09 100644 > --- a/tcg/tcg-op.h > +++ b/tcg/tcg-op.h > @@ -2147,16 +2147,24 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, > TCGv_i64 ret, > tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0, > TCGV_LOW(c1), TCGV_HIGH(c1), > TCGV_LOW(c2), TCGV_HIGH(c2), cond); > - tcg_gen_neg_i32(t0, t0); > > - tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); > - tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); > - tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); > + if (TCG_TARGET_HAS_movcond_i32) { > + tcg_gen_movi_i32(t1, 0); > + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, t1, > + TCGV_LOW(v1), TCGV_LOW(v2)); > + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, t1, > + TCGV_HIGH(v1), TCGV_HIGH(v2)); > + } else { > + tcg_gen_neg_i32(t0, t0); > > - tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); > - tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); > - tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); > + tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); > + tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); > + tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); > > + tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); > + tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); > + tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); > + } > tcg_temp_free_i32(t0); > tcg_temp_free_i32(t1); > } else {
At some point I tried to think how to implement movcond_i64 for MIPS directly in the backend. I just tried your patch, and I got this kind of code: | 0x2bb2ae58: sltu at,zero,s4 | 0x2bb2ae5c: sltu t0,zero,s3 | 0x2bb2ae60: or s3,at,t0 | 0x2bb2ae64: movz s1,s5,s3 | 0x2bb2ae68: movz s2,s6,s3 | | (in some cases some constants/globals loading appear in the middle, but | that's not due to movcond). It's basically the kind of code I would have written. It's clearly better to implement it directly in TCG. Now I wonder if it wouldn't be better to write brcond2 as setcond2 + brcond. And even setcond2 as a pair of setcond in TCG, which would allow some optimizations in case both high parts are zero. Tested-by: Aurelien Jarno <aurel...@aurel32.net> Reviewed-by: Aurelien Jarno <aurel...@aurel32.net> -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net