Re: [fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)

Nikolay Nikolov via fpc-devel Tue, 02 Feb 2021 23:58:11 -0800


On 2/2/21 11:06 PM, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,
I've found a potential optimisation for conditions of the form "(x <>0) and (y <> 0)", which are very common because this is semanticallyequivalent to "Assigned(x) and Assigned(y)", for example, and such aconstruct is generated implicity in TObject.Destroy, for example. Thenode tree for TObject.Destroy example (after the first pass) is asfollows (it checks the value of Self and VMT):
The node tree inside the if-node's condition branch is as follows:
<andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="3"> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="1"> <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"complexity="0">
         <symbol>self</symbol>
      </loadn>
<niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"complexity="0">
      </niln>
   </unequaln>
<unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="1"> <typeconvn resultdef="$char_pointer" pos="329,9"flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"convtype="tc_equal"> <typeconvn resultdef="Pointer" pos="329,9"flags="nf_pass1_done" complexity="0" convtype="tc_equal"> <loadn resultdef="<no type symbol>" pos="329,9"flags="nf_pass1_done" complexity="0">
               <symbol>vmt</symbol>
            </loadn>
         </typeconvn>
      </typeconvn>
<pointerconstn resultdef="$char_pointer" pos="329,9"flags="nf_pass1_done,nf_explicit" complexity="0">
         <value>$0000000000000000</value>
      </pointerconstn>
   </unequaln>
</andn>
On x86-64_win64, the following assembly language is generated (smartpipelining and out-of-order execution MIGHT permit it to be executedin 3 cycles, but it's more likely going to take 4 or 5):
    testq    %rbx,%rbx
    setneb   %al
    testq    %rsi,%rsi
    setneb   %dl
    andb     %dl,%al
DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and notB", and "not (A and B) = not A or not B", and using these, we candevelop much more efficient assembly language (which will almostcertainly take 3 cycles to run, and is much smaller):
    movq     %rbx,%rdx
    orq      %rsi,%rdx
    seteb    %al
For this particular routine, %rsi and %dl/%rdx are not usedafterwards, and can be simplified further (this bit will probably be apeephole optimisation), dropping the cycle count to 2:
    orq      %rbx,%rsi
    seteb    %al
In this situation it is safe because the comparison values are alreadyin registers and code generation has permitted the bypassing ofBoolean short-circuiting.
Rather than write an entire peephole optimisation, I would ideallylike to program this optimisation at the nodal level, and permit thecompiler to convert it into something resembing the following ("andn"becomes "notn" -> "orn", and the two "unequaln" nodes become "equaln"):
<notn resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="3"> <orn resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="3"> <equaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="1"> <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"complexity="0">
            <symbol>self</symbol>
         </loadn>
<niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"complexity="0">
         </niln>
      </equaln>
<equaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"complexity="1"> <typeconvn resultdef="$char_pointer" pos="329,9"flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"convtype="tc_equal"> <typeconvn resultdef="Pointer" pos="329,9"flags="nf_pass1_done" complexity="0" convtype="tc_equal"> <loadn resultdef="<no type symbol>" pos="329,9"flags="nf_pass1_done" complexity="0">
                  <symbol>vmt</symbol>
               </loadn>
            </typeconvn>
         </typeconvn>
<pointerconstn resultdef="$char_pointer" pos="329,9"flags="nf_pass1_done,nf_explicit" complexity="0">
            <value>$0000000000000000</value>
         </pointerconstn>
      </equaln>
   </orn>
</notn>
...but there is a catch. In terms of the raw nodes, converting the"andn" node into a "notn" and an "orn" node results in a more complexnode tree and hence less efficient assembly language in general,unless a particular "pass_generate_code" routine knows to look out forthe set-up of a logical "or" that combines two variables' comparisonagainst zero, like that shown above.
My question is... how should it be developed so that the nodeoptimisation is performed on platforms that have the necessaryassembly instructions, like x86_64 and AArch64 (once I develop them),but also not perform the optimisation on platforms that don't have thenecessary instructions or checks in "pass_generate_code"? Allowingthe optimisation wouldn't cause bad code generation, but it would bemore inefficient in the general case.
I hope I explained that okay.

You can introduce a new pass in the compiler, that does the nodetransformation only on platforms that benefit from it. See e.g. what Idid a few years ago in optloadmodifystore.pas in the compiler sources.This is an optimization that introduces new nodes that generate fastercode on x86 for patterns, such as:


a := a + b

Which can be done with a single 'add mem, const' or 'add mem, reg'instruction on all processors in the x86 family (i8086, i386 andx86_64). Also, I think there were other CPUs that benefit from thisoptimization (m68k?). For that, I introduced new inline compiler nodes,that are only used for this optimization.

(Other possibilities include introducing new node types "norn" and"nandn" (and maybe "nxorn" to complete the set), and having the "andn"node above be converted into "norn", so that they can be instantlyconverted into the relevant assembly instructions instead of relyingon multiple steps of optimistion)

Introducing norn and nand for optimization purposes might not be a badidea, considering there are some CPU architectures that also have norand nand instructions (powerpc, for example).


Best regards,

Nikolay

P.S. I don't think we need "WARNING: Technical!" trigger warnings onfpc-devel ;-)


_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)

Reply via email to