Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-24 Thread Florian Klämpfl
Am 18.03.2019 um 02:57 schrieb Ben Grasset:
> On Sun, Mar 17, 2019 at 1:57 PM Florian Klämpfl  > wrote:
> 
> 
> How is it better than intrinsics support (similiar to gcc/icc etc.)?
> 
> 
> Well, it wouldn't be better than a literal equivalent to those intriniscs, if 
> that's what we're talking about. By which
> I mean, like, say how in Clang/GCC (or languages such as Rust that use LLVM), 
> if you do _mm_loadl_pd or whatever, that
> translates not to a function call but directly to the "inlined" assembler 
> instructions (at least in release builds.) 

Yes, that's what I mean. So far I have not seen a single advantage of inlining 
pure assembler routines over such intrinsics.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-18 Thread Marco van de Voort


Op 3/17/2019 om 6:57 PM schreef Florian Klämpfl:


Something along these lines is absolutely sorely lacking in FPC 
currently, don't let anyone tell you otherwise.


How is it better than intrinsics support (similiar to gcc/icc etc.)?


Intrinsics are common, and prepared for you by compiler devels.

 Inline assembler blocks are something you use as a programmer as soon 
as you reuse a block a lot. An example in the embedded world, e.g. as an 
interrupt prologue or epilogue, things like prologues for SPI operations 
(lowering/raising CS and enabling/disabling interrupts, padded with 
appropriate nops if needed if the slave is slower than the master).


Example PIC32MK/MZ (=MIPS) which has quite some different interrupt 
prologues depending on what features are used (shadow registers and the 
like).


These are now gcc inline macros.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Ben Grasset
To clarify my last message, what I had said previously was that FPC needs
*either* inlinable assembler *or* intrinsics. Just basically something that
ultimately can be called as a "normal" Pascal function, but that does *not*
end up as an un-inlined function call. I have no preference as to how
exactly it's done under the hood.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Ben Grasset
On Sun, Mar 17, 2019 at 1:57 PM Florian Klämpfl 
wrote:

>
> How is it better than intrinsics support (similiar to gcc/icc etc.)?
>

Well, it wouldn't be better than a literal equivalent to those intriniscs,
if that's what we're talking about. By which I mean, like, say how in
Clang/GCC (or languages such as Rust that use LLVM), if you do _mm_loadl_pd
or whatever, that translates not to a function call but directly to the
"inlined" assembler instructions (at least in release builds.)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Jeppe Johansen

On 3/17/19 9:58 PM, Florian Klämpfl wrote:

Am 17.03.19 um 21:47 schrieb Martok:

Am 17.03.2019 um 18:57 schrieb Florian Klämpfl:

How is it better than intrinsics support (similiar to gcc/icc etc.)?

It *exists*?

Remember how long it took to get PopCnt support? 


PopCnt is not really an intrinsic as it has a fallback counter part 
and works on all platforms. Intrinsic means that it is really mapped 
directly to the CPU instruction without any fallbacks.


As the branch of Jeppe shows, it is pretty easy, just requires some 
continuous work.


As far as I recall there were some issues in argument passing and 
handling the new __mm128 type that were a little annoying. But it might 
be fairly easy to bring back up to speed again.



How about the rest of the BMI? > TBM? AES-NI? Newer AVX?


See above.
Agreed if that's the route that's taken. It's my feeling that the newer 
extensions took a much more regular/orthogonal route than the old weird 
MMX and SSE forms.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Florian Klämpfl

Am 17.03.19 um 21:47 schrieb Martok:

Am 17.03.2019 um 18:57 schrieb Florian Klämpfl:

How is it better than intrinsics support (similiar to gcc/icc etc.)?

It *exists*?

Remember how long it took to get PopCnt support? 


PopCnt is not really an intrinsic as it has a fallback counter part and 
works on all platforms. Intrinsic means that it is really mapped 
directly to the CPU instruction without any fallbacks.


As the branch of Jeppe shows, it is pretty easy, just requires some 
continuous work.



How about the rest of the BMI? > TBM? AES-NI? Newer AVX?


See above.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Sven Barth via fpc-devel
Martok  schrieb am So., 17. März 2019, 21:47:

> Am 17.03.2019 um 18:57 schrieb Florian Klämpfl:
> > How is it better than intrinsics support (similiar to gcc/icc etc.)?
> It *exists*?
>
> Remember how long it took to get PopCnt support? How about the rest of the
> BMI?
> TBM? AES-NI? Newer AVX?
>

You are aware that the assembler reader needs to support new instructions
as well?

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Florian Klämpfl

Am 17.03.19 um 18:18 schrieb J. Gareth Moreton:

Hi Florian,

I think the main thing is that Object
Pascal has always supported the ability to
drop into assembly language, unlike C++
which requires a dialect-specific
extension and is not allowed at all under
Microsoft Visual C++ 64-bit. 


Actually, the policy of FPC is to avoid assembler as much as possible.


Allowing
certain assembler routines to be inlined
seems like a logical extension using
language semantics that already exist.

Part of it may be preference but I think
some people like the fine degree of
control that assembly language offers,
while intrinsics > I find, can quickly get
somewhat untidy and confusing, 


I cannot see how inlining does this better?


especially
with instructions like CPUID that read and
write to specific registers (although my
code forbids that instruction because EBX
is non-volatile).


CPUID is that slow, it makes no sense to inline it.



The other thing... no matter how good the
compiler is, there are some situations
where assembly language will always
perform better.


I doubt this as inline assembler pure routines use always fixed 
registers. Intrinsics don't do so.




I suppose I would like to ask the
community more than anything. Is this a
feature that you'd like to see and use?

I hoped the way that I designed the patch
helps to alleviate the can of worms by
simply not allowing inline if the platform
doesn't have the ability to support it
yet. 


But this is exactly the can of worms. People will request that we 
support it on all CPUs.



"CanInline" simply returns False
unless overridden.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Jonas Maebe

On 17/03/2019 18:18, J. Gareth Moreton wrote:

Part of it may be preference but I think
some people like the fine degree of
control that assembly language offers,


That is absolutely correct. That is both its strength and its weakness. 
The weakness is that it is impossible to integrate such code safely in 
compiler-generated code without the programmer saying exactly what that 
code does (in terms of constraints, like GCC supports: 
https://gcc.gnu.org/onlinedocs/gcc/Constraints.html ).


E.g., at least the following issues exist with your patch, but that's 
not because your code is of bad quality. It's simply that it is 
impossible to fully analyse inline assembly and determine it to be safe:
* you forbid modifying the stack, but loading the stack pointer in 
another register and then modifying the stack through this other 
register is not caught (or e.g. loading a value from memory that happens 
to point to the stack)
* you skip over db/dw/dd/dq directives, even though these can also be 
used to encode instructions (often ones not (yet) supported by the 
compiler). There may be more assembler directives like that that could 
influence the code.


Additionally, your remark regarding memory barriers is a bit dangerous: 
these instructions must not only act as memory barriers to the 
processor, but also to the compiler. I.e., the compiler must not be 
allowed to optimise certain things across such a barrier (e.g. (re)move 
memory reads or writes), because then the barrier will no longer serve 
its purpose. That is the main reason why marking them as "they change 
everything" should probably stay for the foreseeable future.


The performance overhead of memory barriers is also many times greater 
than that of a call/return, so I don't think it will actually matter 
that much (although it would still be better to get rid of the 
call/return than not, of course -- provided the compiler can be told to 
not optimise anything across it).



I thought I sent a mail in the previous thread about this, but I can't 
find it anymore so maybe I did not. What I though I said before, is that 
I think that inlining pure assembler functions is something that should 
never be done. A pure assembler function, especially with 
"nostackframe", is the programmer literally telling the compiler "you 
have absolutely no business messing with this code".


On the other hand, if you have a regular function with an inline 
assembler block, then inlining becomes a whole lot more feasible. 
Especially if you add support for GCC-like constraints. Then there is no 
issue with the assembler code expecting arguments in certain registers, 
possibly returning in the middle of the block, messing up the stack etc, 
because you simply cannot do that in this scenario. This means you don't 
have to (try) to check for this either. And there is already rudimentary 
support for specifying constraints in this case (which registers get 
modified).


It would be much less of a quick win (e.g. because the compiler does not 
support passing variables in registers to assembler blocks right now), 
but in the long run it would be fully supportable and much more 
maintainable. It would also require much less target-specific support, 
because it would not require trying to figure out what the assembler 
block is doing.


That said: for optimal performance, you will usually still want 
intrinsics rather than inline assembly, simply because the compiler can 
then be taught to reason about them, and perform constant propagation 
through them (and potentially eliminate them altogether).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread J. Gareth Moreton
The other thing is that it will benefit 
cross-platform functions like Trunc and 
ReadWriteBarrier better, which call only 
single assembler commands. One can simply 
add "inline" to their platform-specific 
implementation for the performance gains 
without having to rewrite the routines to 
use intrinsics.

Gareth aka. Kit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-17 Thread Florian Klämpfl

Am 17.03.19 um 02:54 schrieb Ben Grasset:
Inlining of pure assembler functions would actually be immediately, 
specifically useful to me! I've been having a go at improving FPC scores 
on "BenchmarksGames", and was so far successful with Binary Trees simple 
by throwing a really good threading library at it, however, there are 
some benchmarks that simply can't be fixed without either proper 
intrinsics or user-specifiable inlinable ASM methods. I have a working 
re-implmentation of NBody (that is just a direct rewrite of the Rust 
implementation) where I've implemented __m128 and __m128d as records 
with static nostackframe assembler "class functions", however it's just 
not fast enough to be competitive due to the inability to inline any of 
the assembler methods.


Something along these lines is absolutely sorely lacking in FPC 
currently, don't let anyone tell you otherwise.


How is it better than intrinsics support (similiar to gcc/icc etc.)?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-16 Thread Ryan Joseph


> On Mar 16, 2019, at 9:54 PM, Ben Grasset  wrote:
> 
> Inlining of pure assembler functions would actually be immediately, 
> specifically useful to me! I've been having a go at improving FPC scores on 
> "BenchmarksGames", and was so far successful with Binary Trees simple by 
> throwing a really good threading library at it, however, there are some 
> benchmarks that simply can't be fixed without either proper intrinsics or 
> user-specifiable inlinable ASM methods. I have a working re-implmentation of 
> NBody (that is just a direct rewrite of the Rust implementation) where I've 
> implemented __m128 and __m128d as records with static nostackframe assembler 
> "class functions", however it's just not fast enough to be competitive due to 
> the inability to inline any of the assembler methods.
> 
> Something along these lines is absolutely sorely lacking in FPC currently, 
> don't let anyone tell you otherwise.

Sounds exciting progress for FPC. Btw what happened to the development of 
“pure” function modifier that would make it possible to use functions in 
compile time expressions? I was pretty excited about what could be done with 
that also.

Regards,
Ryan Joseph

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-16 Thread Ben Grasset
Inlining of pure assembler functions would actually be immediately,
specifically useful to me! I've been having a go at improving FPC scores on
"BenchmarksGames", and was so far successful with Binary Trees simple by
throwing a really good threading library at it, however, there are some
benchmarks that simply can't be fixed without either proper intrinsics or
user-specifiable inlinable ASM methods. I have a working re-implmentation
of NBody (that is just a direct rewrite of the Rust implementation) where
I've implemented __m128 and __m128d as records with static nostackframe
assembler "class functions", however it's just not fast enough to be
competitive due to the inability to inline any of the assembler methods.

Something along these lines is absolutely sorely lacking in FPC currently,
don't let anyone tell you otherwise.

On Sat, Mar 16, 2019 at 6:03 PM J. Gareth Moreton 
wrote:

> Of course, my worry now is that we've submitted so many patches and issues
> that we'll just be building an ever-growing back-log that may never be
> cleared.  It also depends on what Florian's own vision for the future of
> Free Pascal is, I think.
>
> Gareth aka. Kit
>
>
>
> On Sat 16/03/19 17:05 , "J. Gareth Moreton" gar...@moreton-family.com
> sent:
>
> Normally Florian or another administrator will say if the patch has been
> applied and mark the ticket as "resolved" if they're happy.  Once they are
> added I will have a play around.
>
> Admittedly one thing I'm also waiting on is the node XML dump feature (
> https://bugs.freepascal.org/view.php?id=35017 ), since that will allow me
> to see how procedures are constructed at the intermediate level, plus I
> sort of need it in order to work out what's going on with
> https://bugs.freepascal.org/view.php?id=32913 , something I've vowed to
> fix but has been sitting dormant for ages now - I'll have to apologise to
> David Hawk (the reporter) afterwards.
>
> Gareth aka. Kit
>
>
>
> On Sat 16/03/19 16:07 , Ryan Joseph r...@thealchemistguild.com sent:
>
>
>
> > On Mar 15, 2019, at 9:37 AM, Sven Barth via fpc-devel <
> fpc-devel@lists.freepascal.org> wrote:
> >
> > That could maybe be managed once the support for constants as parameter
> for generics is added (note: I don't know right now how SHUFPS works, so
> take the following as pseudo code):
> >
>
> I fixed the patch for constants in generics and uploaded again (
> https://bugs.freepascal.org/view.php?id=35140). Not sure if they added it
> yet or they were waiting for my to fix things. Let me know if that patch is
> in the correct format so I can fix the other one for multi-helpers.
>
> Regards,
> Ryan Joseph
>
> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel;>
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
>
> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel;>
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support forpure assembler routines on x86

2019-03-16 Thread J. Gareth Moreton
 Of course, my worry now is that we've submitted so many patches and issues
that we'll just be building an ever-growing back-log that may never be
cleared.  It also depends on what Florian's own vision for the future of
Free Pascal is, I think.
 Gareth aka. Kit

 On Sat 16/03/19 17:05 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  Normally Florian or another administrator will say if the patch has been
applied and mark the ticket as "resolved" if they're happy.  Once they are
added I will have a play around.

 Admittedly one thing I'm also waiting on is the node XML dump feature (
https://bugs.freepascal.org/view.php?id=35017 ), since that will allow me
to see how procedures are constructed at the intermediate level, plus I
sort of need it in order to work out what's going on with
https://bugs.freepascal.org/view.php?id=32913 , something I've vowed to fix
but has been sitting dormant for ages now - I'll have to apologise to David
Hawk (the reporter) afterwards.

 Gareth aka. Kit

 On Sat 16/03/19 16:07 , Ryan Joseph r...@thealchemistguild.com sent:

 > On Mar 15, 2019, at 9:37 AM, Sven Barth via fpc-devel  wrote: 
 > 
 > That could maybe be managed once the support for constants as parameter
for generics is added (note: I don't know right now how SHUFPS works, so
take the following as pseudo code): 
 > 

 I fixed the patch for constants in generics and uploaded again
(https://bugs.freepascal.org/view.php?id=35140). [2] Not sure if they added
it yet or they were waiting for my to fix things. Let me know if that patch
is in the correct format so I can fix the other one for multi-helpers. 

 Regards, 
 Ryan Joseph 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [5]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[6]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] https://bugs.freepascal.org/view.php?id=35140).
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://secureweb.fast.net.uk/ http:=
[5] mailto:fpc-devel@lists.freepascal.org
[6] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel