Re: [fpc-pascal] FPC Graphics options?
> On May 21, 2017, at 2:34 AM, Jonas Maebewrote: > The Pascal test program that was benchmarked here contains a number of > bugs/wrong translations from the C code (some stem from the original version, > another one was added): Thanks for looking this over. I’m personally a little worried when I see this kind of thing because I don’t know the causes and how it affects my code. Despite all the noise I think we finally got down to bed rock though. Unfortunately as a person who doesn’t understand compilers well all I can conclude from this is to avoid floating point math in tight loops. That’s probably not accurate enough but that’s the only way I can understand it right now. What I’m hearing is there are some bad C translations and some missing FPC features. Not sure what percent is translations and what is FPC but I think it’s mainly on the side of the compiler. > > Then, there's one thing that can be done to optimize the Pascal version > (after removing the bugs above): > 1) Compile with SSE3 or higher, in particular because SSE3 can be used to > implement trunc() with a single instruction (otherwise we pass via a helper > that uses the x87 fpu, which moreover has to reconfigure it to change the > rounding more and restore it afterwards). However, there does seem to be a > bug in FPC 3.0.2 whereby compiling this program for -O2 -Cfsse3 causes it to > crash, because then it loads data from an 8-byte aligned location on the > stack. It works fine when compiled with trunk and -O2 -Cfsse3 though (at > least for 64 bit). I just compiled with ppcx64 3.1.1 (from 3.0.2) and went from 8fps to 22fps without optimizations and 28fpc with (I got some divide by zero errors but that’s just translations). What is that about? What changed? Just curious, why isn’t -Cfsse3 always enabled in optimizations? It seems like we want this on always. > > There's at least one minor twist of the classic "C compiler evaluates > constant stuff at compile time": > 1) oy and oz are constant. The "floor" function is a standard C library > function, and hence C compilers know what it does and can evaluate it at > compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions are > (equal) constants for C compilers. How are those constants? I see them defined as "float oy = 32.5;” in the c version. Regards, Ryan Joseph ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC Graphics options?
On 20.05.2017 21:34, Jonas Maebe wrote: > There's at least one minor twist of the classic "C compiler evaluates > constant stuff at compile time": > 1) oy and oz are constant. The "floor" function is a standard C library > function, and hence C compilers know what it does and can evaluate it at > compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions > are (equal) constants for C compilers. Would it help here if we'd declare suitable overloads for Floor() for the various floating point types instead of only the "Float" one, declare them as inline and have the inline nodes for Frac() and Trunc() handle constant values? At least if the compiler also recognizes that oy and oz are constant... Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC Graphics options?
Am 20.05.2017 um 21:34 schrieb Jonas Maebe: > Also in summary, very little was learned from this. We have known for a long > time that FPC needs SSA > for better code generation for loops (and Florian has been working on it for > a long time too). Actually, this is not completely true :) What FPC needs to generate better code in this case (on SYS V ABI targets), is life splitting around call nodes. This needs no SSA/SSA might actually not help. I have a patch for it, but not finished, as another patch is needed for this to make it work well: spill coalescing (nodes/registers which are spilled, are spilled to the same memory location if they are not interfering but connected by a move). I have also a half backed patch for this, but never finished it nor committed it to the official trunk. Both patches combined result for the example in much better code regarding register usage as variables can go to xmm registers which are stored/restored around call nodes. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC Graphics options?
On 19/05/17 02:54, Ryan Joseph wrote: On May 18, 2017, at 10:40 PM, Jon Fosterwrote: 62.44 1.33 1.33 fpc_frac_real 26.76 1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT 10.33 2.12 0.22 FPC_DIV_INT64 Thanks for profiling this. Floor is there as I expected and 26% is pretty extreme but the others are floating point division? How does Java handle this so much better than FPC and what are the work arounds? The Pascal test program that was benchmarked here contains a number of bugs/wrong translations from the C code (some stem from the original version, another one was added): 1) casting a floating point number to an int in C does not round, but truncates (I think this may have been mentioned earlier in the thread, I didn't read everything) 2) The usage of floor in the test program is wrong. C's floor takes a floating point number and returns one. The math unit's floor function takes a floating point number and returns an integer. In the Pascal version, this integer is then converted back to a floating point number because the rest of that expression also uses floating point. 3) The Pascal version uses longword instead of int32 for a number of variables (that are "int" in the C version). This results in one expression getting evaluated as 64 bit on 32 bit systems, which is where the FPC_DIV_INT64 calls come from (that's a routine to perform 64 bit *integer* divisions on 32 bit platforms) 4) frac() is only used to get a monotonous increasing value as part of the data input for the test program. The C code (and original Pascal version) uses a tick count and multiplies/divides that, which is much faster. Then, there's one thing that can be done to optimize the Pascal version (after removing the bugs above): 1) Compile with SSE3 or higher, in particular because SSE3 can be used to implement trunc() with a single instruction (otherwise we pass via a helper that uses the x87 fpu, which moreover has to reconfigure it to change the rounding more and restore it afterwards). However, there does seem to be a bug in FPC 3.0.2 whereby compiling this program for -O2 -Cfsse3 causes it to crash, because then it loads data from an 8-byte aligned location on the stack. It works fine when compiled with trunk and -O2 -Cfsse3 though (at least for 64 bit). There's at least one minor twist of the classic "C compiler evaluates constant stuff at compile time": 1) oy and oz are constant. The "floor" function is a standard C library function, and hence C compilers know what it does and can evaluate it at compile time. Therefore, the oy-floor(oy) and oz-floor(oz) expressions are (equal) constants for C compilers. Finally, there are two things FPC definitely is missing: 1) an SSE version of the int() function (which is the basis of a floating point version of floor()) (fairly specific to this program) 2) SSA support in loops (to make better use of SSE registers; related to Florian's note about the calling conventions). However, without the previous changes, even FPC code compiled to LLVM IR and then compiled to machine code with Clang (and hence with full SSA support) results in even worse performance than the code directly compiled with FPC. There are definitely more things (as I did not manage to get FPC's LLVM IR to compile to a version that's equally fast as the LLVM IR generated from the C program), but I already spent more time than is reasonable on this. I hope the "the sky is falling" comments will stop though. In summary, as has been mentioned by several people in this thread: you (not directed have to you personally, Ryan) always have to check where your program's slowness comes from, otherwise your test/benchmark is worse than useless (because it just creates confusion, and wastes other people's time when they get tired of mailing list getting flooded by the same information-less statements over and over again). Also in summary, very little was learned from this. We have known for a long time that FPC needs SSA for better code generation for loops (and Florian has been working on it for a long time too). Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Ignoring function results
On 20/05/17 12:30, Bart wrote: On 5/20/17, Mark Morgan Lloydwrote: According to the Programmer's Guide 1.3.41, {$EXTENDEDSYNTAX OFF} has> the effect of permitting the result of a function to be ignored. Isn't that just the other way around? "Extended syntax allows you to drop the result of a function. Thismeans that you can use a function call as if it were a procedure.By default this feature is on. You can switch it off using the {$X-}or {$EXTENDEDSYNTAX OFF}directive." Just a mo, let me have another shot at that in case I was doing something stupid... it's definitely got to be on for optional parameters to be accepted, and that appears to be the default state if {$mode objfpc}{$H+} is at the top of the unit. The curious thing is that in the cold light of day I can't get $EXTENDEDSYNTAX to have any effect on the function result. I'll admit what I'm doing: operator <= (var a: TDateTimeArray; const s: TDateTime): boolean; begin result := Length(a) > 0; SetLength(a, Length(a) + 1); a[High(a)] := s end { <= } ; operator + (const a: TDateTimeArray; const s: TDateTime): TDateTimeArray; var b: boolean; begin result := a; if Length(result) = 0 then { b := } result <= s else result[High(result)] += s end { + } ; If I uncomment the boolean assignment it works. Where I appeared to be last night was that setting $EXTENDEDSYNTAX OFF had the above working, but I'm now having trouble duplicating it. And I hadn't touched a drop :-) -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC Graphics options?
On 05/19/2017 06:13 PM, Jon Foster wrote: On 05/19/2017 04:11 AM, Nikolay Nikolov wrote: On 05/19/2017 03:54 AM, Ryan Joseph wrote: On May 18, 2017, at 10:40 PM, Jon Fosterwrote: 62.44 1.33 1.33 fpc_frac_real 26.76 1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT 10.33 2.12 0.22 FPC_DIV_INT64 Thanks for profiling this. Floor is there as I expected and 26% is pretty extreme but the others are floating point division? How does Java handle this so much better than FPC and what are the work arounds? Just curious. As it stands I can only reason that I need to avoid dividing floats in FPC like the plague. [...] The default options for the i386 compiler is to target the Pentium CPU, which does not have SSE. This gives most compatibility and least performance, but that's what's appropriate for most users, because for most desktop applications, CPU speed is no longer an issue. Only very specific tasks, such as software 3D rendering need high CPU performance, and people doing that stuff, usually know very well their compiler options and how to enable support for modern instruction extensions for maximum performance. Of course, people coming from a Java background might not be used at all to having to do this kind of stuff, but it's really not that hard. As stated I tried *ALL* of the FPU settings and received the same result or an "access violation", which I assumed meant my FPU did not support that feature set. Access violation means usually accessing memory, which is way out of bounds. You can try turning range and overflow checking on, but there's no guarantee it is going to catch it. However, you should try to narrow it down to find the offending location. It could be a bug in your code, or a bug in the code generator (which produces an invalid result from a given calculation). I even tried to enable emulation, to see what the difference would be, but ppc386 said it was an invalid switch even though it lists it in the help output. Emulation is only supported under go32v2 (the 32-bit DOS target) and is only needed on 486SX and 386 CPUs without an FPU, so it's very unlikely you would need it. 486DX and above all have a built-in FPU and need no emulation. And newer instruction set extensions such as SSE2 and SSE3 are never emulated, because emulation usually defeats the purpose of your code being faster. However, it is very likely that your CPU has SSE2 and SSE3 support, unless it is very ancient. Btw, what CPU do you have? Nikolay ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Ignoring function results
On 5/20/17, Mark Morgan Lloydwrote: > According to the Programmer's Guide 1.3.41, {$EXTENDEDSYNTAX OFF} has > the effect of permitting the result of a function to be ignored. Isn't that just the other way around? "Extended syntax allows you to drop the result of a function. This means that you can use a function call as if it were a procedure. By default this feature is on. You can switch it off using the {$X-} or {$EXTENDEDSYNTAX OFF}directive." Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC Graphics options?
On 05/19/2017 04:11 AM, Nikolay Nikolov wrote: On 05/19/2017 03:54 AM, Ryan Joseph wrote: On May 18, 2017, at 10:40 PM, Jon Fosterwrote: 62.44 1.33 1.33 fpc_frac_real 26.76 1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT 10.33 2.12 0.22 FPC_DIV_INT64 Thanks for profiling this. Floor is there as I expected and 26% is pretty extreme but the others are floating point division? How does Java handle this so much better than FPC and what are the work arounds? Just curious. As it stands I can only reason that I need to avoid dividing floats in FPC like the plague. [...] The default options for the i386 compiler is to target the Pentium CPU, which does not have SSE. This gives most compatibility and least performance, but that's what's appropriate for most users, because for most desktop applications, CPU speed is no longer an issue. Only very specific tasks, such as software 3D rendering need high CPU performance, and people doing that stuff, usually know very well their compiler options and how to enable support for modern instruction extensions for maximum performance. Of course, people coming from a Java background might not be used at all to having to do this kind of stuff, but it's really not that hard. As stated I tried *ALL* of the FPU settings and received the same result or an "access violation", which I assumed meant my FPU did not support that feature set. I even tried to enable emulation, to see what the difference would be, but ppc386 said it was an invalid switch even though it lists it in the help output. -- Sent from my Debian Linux laptop -- http://www.debian.org/intro/about Jon Foster JF Possibilities, Inc. j...@jfpossibilities.com 541-410-2760 Making computers work for you! ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Best way to check SimpleIPC for messages
On 17.05.2017 07:08, nore...@z505.com wrote: what happens when the application is not idle, but sort of idle? A new Queue event also only is serviced when no other previous events are peresent hence when the application gets "idle". I don't know when exactly "OnIdle" is called. It can't be in a closed loop otherwise any application would always use 100% CPU. Hence "OnIdle" is bound to work with an even greater latency than a decent queue entry like TThread.Queue or Application.QueueAsyncCall. -Michael ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal