Re: [fpc-devel] Successful implementation of inline support forpureassembler routines on x86

Florian Klämpfl Sun, 24 Mar 2019 05:09:52 -0700

Am 24.03.2019 um 11:33 schrieb J. Gareth Moreton:
> The main thing is the degree of control you have using pure assembler over 
> intrinsics, and someone brought up that
> intrinsics don't give you good access to the FLAGS register.


Juggling with the flags is rarely possible on x86 anyways as almost all 
instructions change them.

> Additionally, unless you do some rather untidy nested
> parameter chaining (calling an intrinsic and passing its result into another 
> intrinsic, several layers deep), you don't
> have too much control over how the results are stored.  Normally not a 
> terrible thing, but if you have a temporary value
> that you know will be discarded, you want it in a register and never stored 
> on the stack, for example.

You do not ensure this with pure assembler either. It's even worse here: the 
instructions use always the same registers
so the code is very prone to do a lot of spilling. Inline pure assembler 
routines will result in most cases in far worse
code than intrinsics as the compiler cannot change the register usage in the 
inlined assembler.

> 
> There's also the issue of maintenance... writing intrinsics for every single 
> possible instruction on every single
> platform and determining that they behave in the way they should.

Just look at the intrinsics branch, this can be easily automated.

> 
> I guess we have been spoilt in a way because Pascal has always supported a 
> clean and efficient way to drop into assembly
> language if you so choose, and this is what I've gotten used to rather than 
> the intrinsics of C++.  I don't like the
> idea of putting breakpoints on the intrinsics and opening up the Disassembly 
> window just to check that the compiler
> isn't blindly storing temporary values on the stack.

... which will happen much more for inlined assembler as the compiler is less 
flexible regarding register usage.

> 
> If I had to give one final reason... there are already functions in the RTL 
> that are written in pure assembly language
> that would easily benefit being inlined, such as SwapEndian and Trunc.  

Actually, trunc renders all these reasons void: it is inlined, if the code is 
compiled for an architecture supporting
the needed instructions (i386-win32, compiled with  -Cpcoreavx2 -Cavx2):

# [5] writeln(trunc(d));
        call    fpc_get_output
        movl    %eax,%ebx
        fldl    U_$P$PROGRAM_$$_D
        fisttpq -8(%ebp)
        pushl   -4(%ebp)
        pushl   -8(%ebp)
        movl    %ebx,%edx
        movl    $0,%eax
        call    fpc_write_text_int64


This is not possible with inlined pure assembler routines. They would use the 
instruction set selected when the rtl was
compiled. trunc shows perfectly why intrinsics are the way to go.

Same could be done for SwapEndian.

> Otherwise they'd have to be rewritten to use
> intrinsics if anyone remembers to.
> 
> There is one other thing... intrinsics haven't been merged into the trunk 
> yet, so we can't test them or determine if
> they are actually what we desire.

This is a not a valid reason. You can play with the svn branch.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Successful implementation of inline support forpureassembler routines on x86

Reply via email to