Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
> -Original Message- > From: fpc-develOn Behalf > Of Florian Klaempfl > Sent: Monday, 30 April 2018 04:28 > > That ended up making things worse in some cases. > > Can you take a look at the generated machine code if delphi uses > proper multi byte nops. If not, the align might make things indeed > worse. It does. The problem was not the time required by the nops, but that for certain entry point alignments (among them the 16 byte alignments) the presence of this .align triggered the 3-4 times increase in processing time. I didn't look any closer into it as the version that J. Gareth worked out is faster and isn't alignment sensitive. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Am 28.04.2018 um 17:57 schrieb Thorsten Engler: >> -Original Message- >> From: fpc-develOn Behalf >> Of Florian Klämpfl >> So something like >> >> cmp edx, $4330 >> jge @@zero >> cmp edx, $3FE0 >> .align 16 >> jbe @@skip >> >> might be much better. > > That ended up making things worse in some cases. Can you take a look at the generated machine code if delphi uses proper multi byte nops. If not, the align might make things indeed worse. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
> From: fpc-develOn Behalf Of J. > Gareth Moreton > Sent: Sunday, 29 April 2018 12:36 > As an extra point, removing the 'skip' check (i.e. cmp ax, $3FE0, jbe @@skip) > removes 6 bytes from the code size and shaves about 2 to 3 nanoseconds off > the execution time in most cases, and it could be argued that it's worth > going for the 'no skip' version because using Frac on a value of x where > |x| < 1 is rather uncommon compared to when |x| >= 1. I agree that calling Frac on values that are already just a fraction is probably not going to happen too often. > However, when running my timing tests, one thing that's confused me > is that when using very large inputs like 10^300, the function is > at least 5 nanoseconds slower than FracSkip2, even though the code > is less complex. This happens even if I put 'align 16' before the @@zero > label. I do not see any noticeable difference between 1e16 and 1e300 as inputs: Code address: Frac1: 00536440 (64) Frac2: 00536490 (16) Frac3: 005364E0 (96) Frac4: 00536530 (48) Frac5: 00536580 (0) Frac6: 005365D0 (80) Frac7: 00536620 (32) Frac8: 00536670 (112) 1st run: In range (1e15+0.5): Frac1 923470 Frac2 964422 Frac3 967501 Frac4 1027080 Frac5 1005352 Frac6 1052105 Frac7 1011983 Frac8 1048743 Out of range (1e16+0.5): Frac1 893526 Frac2 998532 Frac3 894644 Frac4 993987 Frac5 895353 Frac6 994606 Frac7 900848 Frac8 992751 Out of range (1e300): Frac1 897274 Frac2 986679 Frac3 899123 Frac4 999495 Frac5 899438 Frac6 989588 Frac7 885060 Frac8 985288 Only fraction (0.5): Frac1 954220 Frac2 1046781 Frac3 993959 Frac4 1015032 Frac5 1013128 Frac6 1043157 Frac7 928712 Frac8 988220 Also, it seems to be relatively resilient against changes in code alignment even if it's not a multiple of 16: Code address: Frac1: 00536433 (51) Frac2: 0053645D (93) Frac3: 00536487 (7) Frac4: 005364B1 (49) Frac5: 005364DB (91) Frac6: 00536505 (5) Frac7: 0053652F (47) Frac8: 00536559 (89) 1st run: In range (1e15+0.5): Frac1 946247 Frac2 904187 Frac3 902870 Frac4 1025163 Frac5 931021 Frac6 895990 Frac7 1050683 Frac8 952305 Out of range (1e16+0.5): Frac1 883588 Frac2 877412 Frac3 809785 Frac4 831095 Frac5 976555 Frac6 711201 Frac7 791657 Frac8 897085 Out of range (1e300): Frac1 902103 Frac2 901861 Frac3 802404 Frac4 808002 Frac5 972999 Frac6 710888 Frac7 804050 Frac8 875901 Only fraction (0.5): Frac1 945212 Frac2 904468 Frac3 915325 Frac4 997584 Frac5 945569 Frac6 898036 Frac7 1071561 Frac8 906152 > Nevertheless, I conclude that for most situations, using the improved > FracNoSkip gives the best performance and size for typical inputs, > but this may depend on an individual machine's architecture. Seems we got a winner. I was considering the ret like that, but didn't do it as I was worried because SEH under windows expects function prologues and epilogues that exactly match a specific pattern. But in hindsight, this is a no stack frame leaf function anyway, so I don't think that matters. Cheers, Thorsten ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
> -Original Message- > From: fpc-develOn Behalf > Of Florian Klämpfl > So something like > > cmp edx, $4330 > jge @@zero > cmp edx, $3FE0 > .align 16 > jbe @@skip > > might be much better. That ended up making things worse in some cases. Here is a branchless version: function Frac1(const X: Double): Double; asm .noframe movq rdx, xmm0 mov rax, rdx xor rcx, rcx shr rdx, 32 and edx, $7FF0 cmp edx, $4330 cmovgerax, rcx movq xmm0, rax cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 end; It performs slightly slower in the "in range" case, noticeable worse in the other 2 cases (as it's exactly the same for all 3). I would guess that the "in range" case is the most common (you aren't going to call Frac if you know ahead of time that it's always 0 as the number is too big, or if you know that it already is a value between -1 and 1), so the higher cost for the out of range and only fraction cases is probably less important than it might look. It IS largely independent of code alignment or predictable patterns in the incoming value: Code address: Frac1: 00536430 (48) Frac2: 00536480 (0) Frac3: 005364D0 (80) Frac4: 00536520 (32) Frac5: 00536570 (112) Frac6: 005365C0 (64) Frac7: 00536610 (16) Frac8: 00536660 (96) 1st run: In range (1e15+0.5): Frac1 1431794 Frac2 1429232 Frac3 1463357 Frac4 1475042 Frac5 1446016 Frac6 1472979 Frac7 1443244 Frac8 1467528 Out of range (1e16+0.5): Frac1 1476556 Frac2 1458534 Frac3 131 Frac4 1427287 Frac5 1427326 Frac6 1427472 Frac7 1428914 Frac8 1419654 Only fraction (0.5): Frac1 1470644 Frac2 1475227 Frac3 1447379 Frac4 1529162 Frac5 1509275 Frac6 1485185 Frac7 1500826 Frac8 1524294 Code address: Frac1: 00536423 (35) Frac2: 00536458 (88) Frac3: 0053648D (13) Frac4: 005364C2 (66) Frac5: 005364F7 (119) Frac6: 0053652C (44) Frac7: 00536561 (97) Frac8: 00536596 (22) 1st run: In range (1e15+0.5): Frac1 1349334 Frac2 1429198 Frac3 1447011 Frac4 1436476 Frac5 1477058 Frac6 1496887 Frac7 1431293 Frac8 1435460 Out of range (1e16+0.5): Frac1 1349939 Frac2 1412543 Frac3 1462295 Frac4 1442081 Frac5 1512579 Frac6 1453593 Frac7 1457510 Frac8 1436533 Only fraction (0.5): Frac1 1371353 Frac2 1443000 Frac3 1437583 Frac4 1415591 Frac5 1474870 Frac6 1437224 Frac7 1452196 Frac8 1453833 Also, it still outperforms Delphi's Frac in all cases. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
On 04/28/2018 09:33 AM, Thorsten Engler wrote: I've attached the source (I'm using Delphi 10.2.3, 64bit to compile it) in case anyone wants to try it out on different cpus and with different alignments (change the {$CODEALIGN 1} and add nops to the XXX1 .. XXX8 procedures to finetune alignment). FWIW: i tried, with my old lazarus 1.6.1 and fpc 3.0.0, to convert the project to a lazarus project but it aborted with no real indication as to the problem... i didn't try with the (also old) trunk install i have... both are pretty old and i've forgotten how i set it all up with fpcup... so i tried just compiling the lpr that did result from the conversion attempt... it looks to be valid source but the compile also failed... mainly for not being able to find System.Diagnostics... maybe i'll muddle about with it later... i was just curious to see what this AMD FX Black FX8350 4Ghz 8-core CPU running linux would do... -- NOTE: No off-list assistance is given without prior approval. *Please keep mailing list traffic on the list unless* *a signed and pre-paid contract is in effect with us.* ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Am 28.04.2018 um 15:33 schrieb Thorsten Engler: > procedure XXX1; > asm > .noframe > nop > nop // added this > end; I did not look at the code in detail but I suspect this is caused by the two branches: cmp edx, $4330 jge @@zero cmp edx, $3FE0 jbe @@skip If two branches are in the same 16 byte block, they share a branch prediction entry on almost all x86 processors and branch prediction suffers very much in this case. So something like cmp edx, $4330 jge @@zero cmp edx, $3FE0 .align 16 jbe @@skip might be much better. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
I’ve only tested it in Delphi, so you’ll have to convert it to the right syntax for fpc, but either of these should do: function Frac1(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 shl rdx, 32 and edx, $7FF0 cmp edx, $4330 jge @@zero cmp edx, $3FE0 jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; function Frac2(const X: Double): Double; asm .align 16 .noframe movq rdx, xmm0 shl rdx, 48 and dx, $7FF0 cmp dx, $4330 jge @@zero cmp dx, $3FE0 jbe @@skip cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 jmp @@skip @@zero: xorpd xmm0, xmm0 @@skip: end; From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf Of Sven Barth via fpc-devel Sent: Friday, 27 April 2018 23:47 To: FPC developers' list <fpc-devel@lists.freepascal.org> Cc: Sven Barth <pascaldra...@googlemail.com> Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64 Bart <bartjun...@gmail.com <mailto:bartjun...@gmail.com> > schrieb am Fr., 27. Apr. 2018, 13:42: On Wed, Apr 25, 2018 at 11:04 AM, <i...@wolfgang-ehrhardt.de <mailto:i...@wolfgang-ehrhardt.de> > wrote: > If you compile and run this 64-bit program on Win 64 you get a crash And AFAICS your analysis of the cause (see bugtracker) is correct as well. function fpc_frac_real(d: ValReal) : ValReal;compilerproc; assembler; nostackframe; asm cvttsd2si %xmm0,%rax { Windows defines %xmm4 and %xmm5 as first non-parameter volatile registers; on SYSV systems all are considered as such, so use %xmm4 } cvtsi2sd %rax,%xmm4 subsd %xmm4,%xmm0 end; CVTTSD2SI — Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer This should not be used to get a ValReal result. The code essentially does the following (instruction by instruction): === code begin === tmpi := int64(d - trunc(d)); tmpd := double(tmpi); Result := d - tmpd; === code end === Though why it fails with the given value is a different topic... Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Yeah, I realized that myself and made a post about that a few minutes ago which seems to have crossed ways with yours... I was just projecting my annoyance about the lack of precision when being forced to do math in double instead of extended... > -Original Message- > From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf > Of Mattias Gaertner > Sent: Saturday, 28 April 2018 03:39 > To: fpc-devel@lists.freepascal.org > Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac > function in FPC3.1.1 / Windows x86_64 > > On Sat, 28 Apr 2018 02:48:14 +1000 > "Thorsten Engler" <thorsten.eng...@gmx.net> wrote: > > > For what it’s worth, Delphi simply decided to give up on doing it > correctly and silently fail if the double is too large to fit in an > Int64. > >[...] > > WriteLn(Frac(1e15+0.5)); > > > > WriteLn(Frac(1e16+0.5)); > > > > When executed in 32bit code, returns: > > That means it has the Extended 80bit type, so it can handle the > 1e16+0.5. > http://docwiki.embarcadero.com/Libraries/Tokyo/en/System.Extended > > > > 5.00E-0001 > > > > 5.00E-0001 > > > > > > > > And when executed in 64bit code, returns: > > That means it has only the 64bit double type, so the +0.5 is lost. > > > > 5.00E-0001 > > > > 0.00E+ > > Mattias > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Actually, that test is wrong, because the 32bit code will use Extended float literals instead of Double. If you force it to use double in 32bit you get the same result. If you want to use SSE instructions, you will need to add additional checks to see if the value will fit into an integer before you use the SSE instructions. From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf Of Sven Barth via fpc-devel Sent: Saturday, 28 April 2018 03:33 To: FPC developers' list <fpc-devel@lists.freepascal.org> Cc: Sven Barth <pascaldra...@googlemail.com> Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64 Thorsten Engler <thorsten.eng...@gmx.net <mailto:thorsten.eng...@gmx.net> > schrieb am Fr., 27. Apr. 2018, 18:48: For what it’s worth, Delphi simply decided to give up on doing it correctly and silently fail if the double is too large to fit in an Int64. WriteLn(Frac(1e15+0.5)); WriteLn(Frac(1e16+0.5)); When executed in 32bit code, returns: 5.00E-0001 5.00E-0001 And when executed in 64bit code, returns: 5.00E-0001 0.00E+ Whoever had the *brilliant* idea to deprecate the FPU in 64bit mode should be shoot and quartered, not necessarily in that order. 64bit code is basically useless if you want to do anything meaningful with floats that are larger than fits into an int64. Ehm, what does the depreciation of the FPU have to do with that the floats are too large for an Int64? Extended would only allow for even larger values that would not fit. And all other functionality "simply" needs to be implemented correctly using the "new" SSE functionality, which however is not without its pitfalls as we have seen. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Thorsten Englerschrieb am Fr., 27. Apr. 2018, 17:47: > > That's true for i386. But on x86_64 cvt(t)sd2si instuctions can > > deal with int64 range, if destination register is a 64-bit one. > > You are still going to be at least 960-bit short... > I've disabled the SSE variant for now again till we've decided how we want to proceed with this. Regards, Sven > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
> That's true for i386. But on x86_64 cvt(t)sd2si instuctions can > deal with int64 range, if destination register is a 64-bit one. You are still going to be at least 960-bit short... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
On Sat, 28 Apr 2018 00:09:12 +1000 "Thorsten Engler"wrote: > Highest integer that fits in a Int64: > 9223372036854775808 > 1e20: > 1 > > Your Int is overflowing. > > You can’t implement Frac by going through an Integer, that will never work. It could work if you check for small and big values before doing so. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
27.04.2018 17:14, Sven Barth via fpc-devel wrote: Thorsten Engler> schrieb am Fr., 27. Apr. 2018, 16:09: Highest integer that fits in a Int64: 9223372036854775808 1e20: 1 __ __ Your Int is overflowing. __ __ You can’t implement Frac by going through an Integer, that will never work. Except if you have an integer that can hold 1.8E308 (which would be a 1024 bit integer, or thereabouts) Yes, I saw that now as well, though it's even worse as the cvttsd2si instruction in fact only works with 32-bit integers. That additionally means that Trunc() and Round() are broken for such values as well as they rely on the same kind of functions. Int() works because it doesn't have an SSE version. That's true for i386. But on x86_64 cvt(t)sd2si instuctions can deal with int64 range, if destination register is a 64-bit one. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Thorsten Englerschrieb am Fr., 27. Apr. 2018, 16:09: > Highest integer that fits in a Int64: > > 9223372036854775808 > > 1e20: > > 1 > > > > Your Int is overflowing. > > > > You can’t implement Frac by going through an Integer, that will never > work. Except if you have an integer that can hold 1.8E308 (which would be a > 1024 bit integer, or thereabouts) > Yes, I saw that now as well, though it's even worse as the cvttsd2si instruction in fact only works with 32-bit integers. That additionally means that Trunc() and Round() are broken for such values as well as they rely on the same kind of functions. Int() works because it doesn't have an SSE version. Regards, Sven > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Highest integer that fits in a Int64: 9223372036854775808 1e20: 1 Your Int is overflowing. You can’t implement Frac by going through an Integer, that will never work. Except if you have an integer that can hold 1.8E308 (which would be a 1024 bit integer, or thereabouts). From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf Of Sven Barth via fpc-devel Sent: Friday, 27 April 2018 23:47 To: FPC developers' list <fpc-devel@lists.freepascal.org> Cc: Sven Barth <pascaldra...@googlemail.com> Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64 Bart <bartjun...@gmail.com <mailto:bartjun...@gmail.com> > schrieb am Fr., 27. Apr. 2018, 13:42: On Wed, Apr 25, 2018 at 11:04 AM, <i...@wolfgang-ehrhardt.de <mailto:i...@wolfgang-ehrhardt.de> > wrote: > If you compile and run this 64-bit program on Win 64 you get a crash And AFAICS your analysis of the cause (see bugtracker) is correct as well. function fpc_frac_real(d: ValReal) : ValReal;compilerproc; assembler; nostackframe; asm cvttsd2si %xmm0,%rax { Windows defines %xmm4 and %xmm5 as first non-parameter volatile registers; on SYSV systems all are considered as such, so use %xmm4 } cvtsi2sd %rax,%xmm4 subsd %xmm4,%xmm0 end; CVTTSD2SI — Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer This should not be used to get a ValReal result. The code essentially does the following (instruction by instruction): === code begin === tmpi := int64(d - trunc(d)); tmpd := double(tmpi); Result := d - tmpd; === code end === Though why it fails with the given value is a different topic... Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel