Re: [Qemu-devel] Expensive emulation of CPU condition flags

2016-07-01 Thread Richard Henderson

On 06/30/2016 11:13 AM, Shuang Zhai wrote:

We wonder if there exists any optimization, e.g., directly mapping the
frontend flags to that of the backend? Any suggestions are appreciated.


Directly mapping frontend to backend flags is a non-starter, since not all 
backends have those flags.


There are alternate methods of emulating condition codes.  As an example, 
target-i386 and target-sparc store two values and an "operation code" value. 
The latter indicates how to treat the former.  This allows for the full 
computation of the flags to be delayed, and for the host compare-and-branch to 
be less complicated.


See also my design for an improved m68k condition code scheme:

  http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg00501.html
especially
  http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg00524.html

You're welcome to experiment with target-arm.  If you can create a scheme that 
performs better than the current, we'd like to hear about it.



r~



[Qemu-devel] Expensive emulation of CPU condition flags

2016-06-30 Thread Shuang Zhai
Hi everyone.


In running an ARMv7 guest on an x86 host, we observed that a guest instruction 
affecting condition flags is often translated into 10+ host instructions. The 
reason seems to be the way that the frontend emulates the condition flags. For 
instance:


Target ARM instruction:

cmp  r9, 0x21 ;


IR instruction:

movi_i32 tmp5,$0x21

sub_i32 NF,r9,tmp5

mov_i32 ZF,NF

setcond_i32 CF,r9,tmp5,geu

xor_i32 VF,NF,r9

xor_i32 tmp7,r9,tmp5

and_i32 VF,VF,tmp7


Host x86 instruction:


sub$0x21,%ebx

mov%ebx,0x208(%r14)

mov%ebx,%r12d

mov%r12d,0x20c(%r14)

cmp$0x21,%ebp

setae  %r13b

movzbl %r13b,%r13d

mov%r13d,0x200(%r14)

xor%ebp,%ebx

xor$0x21,%ebp

and%ebp,%ebx

mov%ebx,0x204(%r14)


Imaging in a tight loop where a cmp instruction is used to compute the 
termination condition, this can be pretty expensive. And lazy evaluation seems 
not to help here.


We wonder if there exists any optimization, e.g., directly mapping the frontend 
flags to that of the backend? Any suggestions are appreciated.


Shuang