On Thu, 17 Dec 2009, Richard Henderson wrote: > On 12/17/2009 07:32 AM, malc wrote: > > > These new opcodes are considered "required" by the backend, > > > because expanding them at the tcg level breaks the basic block. > > > There might be some way to emulate within tcg internals, but > > > that doesn't seem worthwhile, as essentially all hosts have > > > some form of support for these. > .. > > c. Historically things like that were made conditional with > > a generic fallback (bswap, neg, not, rot, etc) > > I answered this one above. A generic fallback would break the > basic block, which would break TCGs simple register allocation. > > > b. Documentation for movcond has a typo, t0 is assigned not t1 > > Oops. Will fix. > > > d. Documentation for setcond2 is missing > > Ah, I see that brcond2 is missing as well; I'll fix that too. > > > It would also be interesting to learn what impact adding those two > > has on performance, any results? > > Hmph, not as much as I would have liked. I suppose Intel is getting pretty > darned good with its branch prediction. It shaved about 3 minutes off > 183.equake from what I posted earlier this week; that's something around a 7% > improvement, assuming it's not just all noise (I havn't run that test enough > times to see what the variation is). >
After fixing a bug (crop was done after reading the cr) i run some openssl speed benchmarks, and, at least here on an MPC7447A, got a speed degradation, tiny but consistent. Took a very quick glance at the generated code and the first thing i saw was this: ---------------- IN: 0x40082295: movzbl (%eax),%eax 0x40082298: cmp $0x3d,%al 0x4008229a: setne %dl 0x4008229d: test %al,%al 0x4008229f: je 0x400822d2 OP after liveness analysis: mov_i32 tmp2,eax qemu_ld8u tmp0,tmp2,$0xffffffff mov_i32 eax,tmp0 movi_i32 tmp1,$0x3d mov_i32 tmp0,eax nopn $0x2,$0x2 sub_i32 cc_dst,tmp0,tmp1 movi_i32 tmp13,$0xff and_i32 tmp4,cc_dst,tmp13 movi_i32 tmp13,$0x0 setcond_i32 tmp0,tmp4,tmp13,ne movi_i32 tmp14,$0xff and_i32 tmp13,tmp0,tmp14 .... OUT: [size=204] 0x601051b0: lwz r14,0(r27) 0x601051b4: lbzx r14,0,r14 0x601051b8: mr r15,r14 0x601051bc: addi r15,r15,-61 0x601051c0: andi. r15,r15,255 0x601051c4: cmpwi cr6,r15,0 0x601051c8: crnot 4*cr7+eq,4*cr6+eq 0x601051cc: mfcr r0 0x601051d0: rlwinm r15,r0,31,31,31 0x601051d4: andi. r15,r15,255 ... So the fact that setcond produces 0/1 was never communicated to the tcg, not that i would claim that it's possible at all... [..snip..] -- mailto:av1...@comtv.ru