On 5/12/15 00:55, Richard Henderson wrote: >> +static void gen_v1cmpeqi(struct DisasContext *dc, >> > + uint8_t rdst, uint8_t rsrc, uint8_t imm8) >> > +{ >> > + int count; >> > + TCGv vdst = dest_gr(dc, rdst); >> > + TCGv tmp = tcg_temp_new_i64(); >> > + >> > + qemu_log_mask(CPU_LOG_TB_IN_ASM, "v1cmpeqi r%d, r%d, %d\n", >> > + rdst, rsrc, imm8); >> > + >> > + tcg_gen_movi_i64(vdst, 0); >> > + >> > + for (count = 0; count < 8; count++) { >> > + tcg_gen_shri_i64(tmp, load_gr(dc, rsrc), (8 - count - 1) * 8); >> > + tcg_gen_andi_i64(tmp, tmp, 0xff); >> > + tcg_gen_setcondi_i64(TCG_COND_EQ, tmp, tmp, imm8); >> > + tcg_gen_or_i64(vdst, vdst, tmp); >> > + tcg_gen_shli_i64(vdst, vdst, 8); > For all of these vector instructions, I would encourage you to use helpers to > extract and insert values. Extraction has little choice but to use a shift > and > a mask, as you use here. But insertion can use tcg_gen_deposit_i64. I think > that is a lot easier to reason with than your construction here which > sequentially shifts vdst. > > E.g. > > static inline void > extract_v1(TCGv out, TCGv in, unsigned byte) > { > tcg_gen_shri_i64(out, in, byte * 8); > tcg_gen_ext8u_i64(out, out); > } > > static inline void > insert_v1(TCGv out, TCGv in, unsigned byte) > { > tcg_gen_deposit_i64(out, out, in, byte * 8, 8); > } > > > This loop then becomes > > TCGv vsrc = load_gr(dc, src); > for (count = 0; count < 8; ++count) { > extract_v1(tmp, vsrc, count); > tcg_gen_setcondi_i64(TCG_COND_EQ, tmp, tmp, imm8); > insert_v1(vdst, tmp, count); > } >
It also needs "tcg_gen_movi_i64(vdst, 0);" or will generate assertion `ts->val_type == TEMP_VAL_REG' in debug mode. And I shall try to send patch within one day (sorry for a little late). Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed