Re: [fpc-devel] What's the status of Maciej's Smart Pointer enhancements?

2018-04-29 Thread Sven Barth via fpc-devel
Anthony Walter  schrieb am So., 29. Apr. 2018, 21:27:

> I've run into an almost must have use case for FPC smart pointers as
> described and implemented by Maciej. I wanted to know from the people who
> make decision about what to merge, what's the status of rolling his
> enhancements at following location into FPC trunk?
>
> svn ls https://svn.freepascal.org/svn/fpc/branches/maciej/smart_pointers/
>

The management operators are already part of trunk for more than a year,
see here:
http://lists.freepascal.org/pipermail/fpc-announce/2017-February/000600.html

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64

2018-04-29 Thread Thorsten Engler
> -Original Message-
> From: fpc-devel  On Behalf
> Of Florian Klaempfl
> Sent: Monday, 30 April 2018 04:28

> > That ended up making things worse in some cases.
> 
> Can you take a look at the generated machine code if delphi uses
> proper multi byte nops. If  not, the align might make things indeed
> worse.

It does.

The problem was not the time required by the nops, but that for certain entry 
point alignments (among them the 16 byte alignments) the presence of this 
.align triggered the 3-4 times increase in processing time. I didn't look any 
closer into it as the version that J. Gareth worked out is faster and isn't 
alignment sensitive.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Broken frac functionin FPC3.1.1 / Windows x86_64

2018-04-29 Thread J. Gareth Moreton
 Hi Florian.  Thorsten and I got down to a fairly optimised version of
Frac, in both speed and size:

 

 function Frac(const X: ValReal): ValReal; assembler; nostackframe;
 asm
   movq  rax,  xmm0
   shr   rax,  48
   and   ax,   $7FF0
   cmp   ax,   $4330
   jge   @@zero
   cvttsd2si rax,  xmm0
   cvtsi2sd  xmm4, rax
   subsd xmm0, xmm4
   ret
 @@zero:
   xorpd xmm0, xmm0
 end;

 

 It fits into just three 16-byte blocks and is the fastest overall from our
tests, although there's a slight penalty if it jumps to @@zero that seems
to be architecture-dependent (e.g. it slows down for me, but Thorsten
didn't see much). Aligning @@zero to a 16-byte boundary may fix this for
some, but it doesn't for me.  Oh the joy of processor intracacies!

 Gareth aka. Kit.

 On Sun 29/04/18 19:28 , Florian Klaempfl flor...@freepascal.org sent:
 Am 28.04.2018 um 17:57 schrieb Thorsten Engler: 
 >> -Original Message- 
 >> From: fpc-devel  On Behalf 
 >> Of Florian Klämpfl 
 >> So something like 
 >> 
 >> cmp edx, $4330 
 >> jge @@zero 
 >> cmp edx, $3FE0 
 >> .align 16 
 >> jbe @@skip 
 >> 
 >> might be much better. 
 > 
 > That ended up making things worse in some cases. 

 Can you take a look at the generated machine code if delphi uses proper 
 multi byte nops. If not, the align might make things indeed worse. 
 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [2] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[3]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel-boun...@lists.freepascal.org
[2] mailto:fpc-devel@lists.freepascal.org
[3] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] What's the status of Maciej's Smart Pointer enhancements?

2018-04-29 Thread Anthony Walter
Here is a video overview of this message


I've run into an almost must have use case for FPC smart pointers as
described and implemented by Maciej. I wanted to know from the people who
make decision about what to merge, what's the status of rolling his
enhancements at following location into FPC trunk?

svn ls https://svn.freepascal.org/svn/fpc/branches/maciej/smart_pointers/

My use case is for adding a free pascal interface to Webit's
JavaScriptCore. The JSC objects all follow Apple's design pattern where C
style API functions return and operate on ref pointer types that must be
destroyed using specific API calls for each type.

It seems like a natural fit with smart pointers, such that record types
could be used to hold onto things such as JSStringRef, JSClassRef,
JSValueRef, and more, retaining and releasing them for you automatically
with the appropriate JSC API when they go out of scope using the smart
pointer Initialize, Finalize, and Copy operators.

e.g. JSClassRetain, JSClassRelease

For those interested, the JSC library works on all platforms and allows for
integrations between native code applications, and a high performance java
scripting engine. The same engine powering webkit browsers. The API allows
you to expose your native code functions as javascript functions or
classes, as well as providing a virtual machine to execute javascript.
Properly confiruged, the javascript can callback your pascal code
optionally passing javascript objects are arguments to your native code.

All of these operations would be much easier to program against with stack
based types (record or object) if smart pointer support was present in FPC.

Finally, with JSC it's quite easy and possible for developers to embed
webkit webview windows in their desktop applications, and optionally grant
any web page the ability invoke free pascal code you decide to expose, and
for free pascal in return invoke arbitrary javascript objects or functions.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64

2018-04-29 Thread Florian Klaempfl
Am 28.04.2018 um 17:57 schrieb Thorsten Engler:
>> -Original Message-
>> From: fpc-devel  On Behalf
>> Of Florian Klämpfl
>> So something like
>>
>>   cmp   edx, $4330
>>   jge  @@zero
>>   cmp   edx, $3FE0
>>   .align   16
>>   jbe  @@skip
>>
>> might be much better.
> 
> That ended up making things worse in some cases.

Can you take a look at the generated machine code if delphi uses proper
multi byte nops. If  not, the align might make things indeed worse.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Broken frac functioninFPC3.1.1 / Windows x86_64

2018-04-29 Thread J. Gareth Moreton
 I stand corrected.

 Normally if I want to write a second epilogue, I simply compile the
project and look at the disassembly, and copy what the compiler puts in for
the epilogue.  The only one I remember off-hand is for leaf functions with
no stack frame and which uses only volatile registers, hence a simple
RET.  I'll get it right in the end!

 Gareth aka. Kit

 P.S. You might want to tell your spam filter that these messages are
sound, so we're not put off at the sight of "*** GMX Spamverdacht ***"!

 On Sun 29/04/18 12:44 , "Thorsten Engler" thorsten.eng...@gmx.net sent:
 > From: fpc-devel  On Behalf 
 > Of J. Gareth Moreton 

 > For functions with a stack frame, either LEAVE or MOV RSP, RBP; POP 
 > RBP must precede it. 
 It's not quite that simple, at least under windows: 
 https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog [2] 

 But yes, this documents that there can be multiple epilogues, so the
additional ret would be fine in this regard. 

 Following this exact format is required so that the stack unwinder during
processing of SEH (structured exception handling) which is performed by the
OS can correctly restore the value of non-volatile registers that has been
saved on the stack. 

 Fortunately, as our function in this case doesn't use the stack at all and
doesn't contain calls to other functions, this can be skipped. 

 > So what happens now? Do we submit a patch? 
 I think Sven Barth was going to fix this up anyway? 

 Cheers, 
 Thorsten 

 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel-boun...@lists.freepascal.org
[2] https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac functionin FPC3.1.1 / Windows x86_64

2018-04-29 Thread Thorsten Engler
> From: fpc-devel  On Behalf
> Of J. Gareth Moreton

> For functions with a stack frame, either LEAVE or MOV RSP, RBP; POP
> RBP must precede it.
It's not quite that simple, at least under windows:
https://docs.microsoft.com/en-us/cpp/build/prolog-and-epilog

But yes, this documents that there can be multiple epilogues, so the additional 
ret would be fine in this regard.

Following this exact format is required so that the stack unwinder during 
processing of SEH (structured exception handling) which is performed by the OS 
can correctly restore the value of non-volatile registers that has been saved 
on the stack.

Fortunately, as our function in this case doesn't use the stack at all and 
doesn't contain calls to other functions, this can be skipped.

> So what happens now? Do we submit a patch?
I think Sven Barth was going to fix this up anyway?

Cheers,
Thorsten

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Broken frac functionin FPC3.1.1 / Windows x86_64

2018-04-29 Thread J. Gareth Moreton
That's great to hear! Glad to help.

For functions with no stack frame, the 
compiler simply puts RET at the very end 
of the routine and is all that's needed. 
For functions with a stack frame, either 
LEAVE or MOV RSP, RBP; POP RBP must 
precede it. The optional parameter after 
RET is the number of bytes allocated to 
the stack by parameters that aren't in 
registers.

So what happens now? Do we submit a patch?

Gareth aka. Kit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64

2018-04-29 Thread Thorsten Engler
> From: fpc-devel  On Behalf Of J. 
> Gareth Moreton
> Sent: Sunday, 29 April 2018 12:36

> As an extra point, removing the 'skip' check (i.e. cmp ax, $3FE0, jbe @@skip) 
> removes 6 bytes from the code size and shaves about 2 to 3 nanoseconds off 
> the execution time in most cases, and it could be argued that it's worth 
> going for the 'no skip' version because using Frac on a value of x where 
> |x| < 1 is rather uncommon compared to when |x| >= 1.
I agree that calling Frac on values that are already just a fraction is 
probably not going to happen too often.

> However, when running my timing tests, one thing that's confused me 
> is that when using very large inputs like 10^300, the function is 
> at least 5 nanoseconds slower than FracSkip2, even though the code 
> is less complex. This happens even if I put 'align 16' before the @@zero 
> label.

I do not see any noticeable difference between 1e16 and 1e300 as inputs:

Code address:
Frac1: 00536440 (64)
Frac2: 00536490 (16)
Frac3: 005364E0 (96)
Frac4: 00536530 (48)
Frac5: 00536580 (0)
Frac6: 005365D0 (80)
Frac7: 00536620 (32)
Frac8: 00536670 (112)

1st run:
In range (1e15+0.5):
Frac1 923470
Frac2 964422
Frac3 967501
Frac4 1027080
Frac5 1005352
Frac6 1052105
Frac7 1011983
Frac8 1048743

Out of range (1e16+0.5):
Frac1 893526
Frac2 998532
Frac3 894644
Frac4 993987
Frac5 895353
Frac6 994606
Frac7 900848
Frac8 992751

Out of range (1e300):
Frac1 897274
Frac2 986679
Frac3 899123
Frac4 999495
Frac5 899438
Frac6 989588
Frac7 885060
Frac8 985288

Only fraction (0.5):
Frac1 954220
Frac2 1046781
Frac3 993959
Frac4 1015032
Frac5 1013128
Frac6 1043157
Frac7 928712
Frac8 988220

Also, it seems to be relatively resilient against changes in code alignment 
even if it's not a multiple of 16:

Code address:
Frac1: 00536433 (51)
Frac2: 0053645D (93)
Frac3: 00536487 (7)
Frac4: 005364B1 (49)
Frac5: 005364DB (91)
Frac6: 00536505 (5)
Frac7: 0053652F (47)
Frac8: 00536559 (89)

1st run:
In range (1e15+0.5):
Frac1 946247
Frac2 904187
Frac3 902870
Frac4 1025163
Frac5 931021
Frac6 895990
Frac7 1050683
Frac8 952305

Out of range (1e16+0.5):
Frac1 883588
Frac2 877412
Frac3 809785
Frac4 831095
Frac5 976555
Frac6 711201
Frac7 791657
Frac8 897085

Out of range (1e300):
Frac1 902103
Frac2 901861
Frac3 802404
Frac4 808002
Frac5 972999
Frac6 710888
Frac7 804050
Frac8 875901

Only fraction (0.5):
Frac1 945212
Frac2 904468
Frac3 915325
Frac4 997584
Frac5 945569
Frac6 898036
Frac7 1071561
Frac8 906152

> Nevertheless, I conclude that for most situations, using the improved 
> FracNoSkip gives the best performance and size for typical inputs, 
> but this may depend on an individual machine's architecture.

Seems we got a winner.

I was considering the ret like that, but didn't do it as I was worried because 
SEH under windows expects function prologues and epilogues that exactly match a 
specific pattern. But in hindsight, this is a no stack frame leaf function 
anyway, so I don't think that matters. 

Cheers,
Thorsten

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel