On 11/23/2021 12:42 PM, Navid Rahimi wrote:
In case of x86_64. This is the code:
src_1(bool, bool):
cmp dil, sil
setb al
ret
tgt_1(bool, bool):
xor edi, 1
mov eax, edi
and eax, esi
ret
Lets look at the latency of the src_1:
cmp: latency of 1: (page 663, table C-17)
setb: latency of 2. They don't report setb latency in intel instruction manual.
But the closest instruction to this setbe does have latency of 2.
But for tgt_1:
xor: latency 1.
mov: latency 1. (But it seems x86_64 does optimize this instruction and
basically it is latency 0 in this case. In Zero-Latency MOV Instructions
section they explain it [1].)
and: latency 1.
So even if you consider setb as latency of 1 it is equal. But if it is latency
of 2, it should be a 1 latency win.
1)
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
But these are target issues you've raised -- those should be handled in
the RTL pipeline and are not a significant concern for gimple.
In gimple your primary goal should be to reduce the number of
expressions that are evaluated. This patch does the opposite.
jeff