On 11/23/2021 12:42 PM, Navid Rahimi wrote:
In case of x86_64. This is the code:

src_1(bool, bool):
         cmp     dil, sil
         setb    al
         ret

tgt_1(bool, bool):
         xor     edi, 1
         mov     eax, edi
         and     eax, esi
         ret


Lets look at the latency of the src_1:
cmp: latency of 1: (page 663, table C-17)
setb: latency of 2. They don't report setb latency in intel instruction manual. 
But the closest instruction to this setbe does have latency of 2.

But for tgt_1:
xor: latency 1.
mov: latency 1. (But it seems x86_64 does optimize this instruction and 
basically it is latency 0 in this case.  In Zero-Latency MOV Instructions 
section they explain it [1].)
and: latency 1.

So even if you consider setb as latency of 1 it is equal. But if it is latency 
of 2, it should be a 1 latency win.

1) 
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
But these are target issues you've raised -- those should be handled in the RTL pipeline and are not a significant concern for gimple.

In gimple your primary goal should be to reduce the number of expressions that are evaluated.  This patch does the opposite.

jeff

Reply via email to