Re: [fpc-devel] Need heap manager -gv explanation
28.04.2014 23:14, Petr Kristan пишет: On Mon, Apr 28, 2014 at 06:12:18PM +0200, Tomas Hajny wrote: On Mon, April 28, 2014 17:56, Mattias Gaertner wrote: On Mon, 28 Apr 2014 17:20:17 +0200 Petr Kristan petr.kris...@epos.cz wrote: Hi I have some application with huge usage ReAllocMem and I found the big performance difference if application is compiled with -gv option (cca 20x faster) then without -gv option. -gv generates code for valgrind. It should be slower with -gv. I suspect fpc heap manager. Is possible to tune fpc heap manager? Is some difference in heap manager if application is comiled with -gv or without -gv option? Use of valgrind requires/triggers use of cmem. Depending on the particular use case (and potentially also the target platform), cmem may indeed be faster. Platform is x86_64 Linux. Others would be better positioned for more detailed comparison among various heap managers with regard to speed in different use cases, overall memory requirements achieved by reuse of previously allocated memory, etc. Reuse of previously allocated memory - it really can be my problem. Here is about 200x call ReAllocMem increasing buffer from 4kB to 80MB. It looks like as buffer is increasing ReAllocMem is slowing. But I must verify this feeling. -gv switch in command line disables the optimized i386 Move procedure (and that's basically the only thing it does), so it indeed should cause slowdown. Comments say that valgrind (some pretty old version of it) is unable to handle the optimizied Move code. In the meantime, valgrind was presumably fixed. At least since my involvement with FPC back in 2005 I was able to use valgrind to profile programs without any trouble, and without recompiling them with -gv. So maybe it's time reconsider the action of -gv switch, or to remove it altogether. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation
On Tue, April 29, 2014 08:45, Sergei Gorelkin wrote: . . -gv switch in command line disables the optimized i386 Move procedure (and that's basically the only thing it does), I presume that you talk about the RTL side (i.e. compiling RTL with -gv). In compiler (regardless whether you use a RTL compiled with -gv or not) it triggers adding CMem (plus also some other changes related to debug information in stabs). so it indeed should cause slowdown. If I understand it correctly, the original poster mentioned that use of valgrind _improves_ speed (considerably) in his case. I still believe that it may be due to use of CMem instead of the standard heap manager. Comments say that valgrind (some pretty old version of it) is unable to handle the optimizied Move code. In the meantime, valgrind was presumably fixed. At least since my involvement with FPC back in 2005 I was able to use valgrind to profile programs without any trouble, and without recompiling them with -gv. So maybe it's time reconsider the action of -gv switch, or to remove it altogether. I can't comment on that; I believe Jonas used Valgrind quite a lot in the past, so he might be able to comment on requirements for its use on various FPC supported platforms. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Mon, Apr 28, 2014 at 09:29:50PM +0200, Mattias Gaertner wrote: On Mon, 28 Apr 2014 21:14:14 +0200 Petr Kristan petr.kris...@epos.cz wrote: [...] Others would be better positioned for more detailed comparison among various heap managers with regard to speed in different use cases, overall memory requirements achieved by reuse of previously allocated memory, etc. Reuse of previously allocated memory - it really can be my problem. Here is about 200x call ReAllocMem increasing buffer from 4kB to 80MB. Check if you are increasing buffers in constant steps. Change the increment to exponentially. I use inteligent block increasing. I can optimize program, but why is fpc heap manager to slow? Here is the sample stress program compilable with fpc, delphi and kylix: program m; {$IFDEF MSWINDOWS} {$APPTYPE CONSOLE} {$ENDIF} uses {$IFDEF MSWINDOWS} Windows, {$ELSE} {$IFDEF FPC} Unix, {$ELSE} Libc, {$ENDIF} {$ENDIF} SysUtils; {$IFDEF MSWINDOWS} function GetTickCount: Cardinal; begin GetTickCount := Windows.GetTickCount; end; {$ELSE} {$IFDEF FPC} function GetTickCount: Cardinal; var tp: TTimeVal; begin fpgettimeofday(@tp, nil); GetTickCount := (Int64(tp.tv_sec) * 1000) + (tp.tv_usec div 1000); end; {$ELSE} function GetTickCount: Cardinal; var ts: TTimeSpec; i: Int64; const CLOCK_MONOTONIC = 1; begin if clock_gettime(CLOCK_MONOTONIC, ts) 0 then begin Result := 0; Exit; end; i := ts.tv_sec; i := i*1000 + ts.tv_nsec div 100; Result := i and $; end; {$ENDIF} {$ENDIF} var p1, p2: Pointer; i, j: integer; ms, sum: Cardinal; const base = 100; begin sum := GetTickCount; for i := 0 to 10 do begin ms := GetTickCount; for j := 1 to 9 do begin ReAllocMem(p1, base*(i*10+j)); ReAllocMem(p2, base*(i*10+j)); end; Writeln(Format('Grow %d-%d %dms', [base*i*10, base*(i*10+9), GetTickCount-ms])); end; FreeMem(p1); FreeMem(p2); Writeln(Format('Sum %dms', [GetTickCount-sum])); end. And here are results: ppcx64 m.pas Free Pascal Compiler version 2.7.1 [2014/02/17] for x86_64 Target OS: Linux for x86-64 Grow 0-900 89ms Grow 1000-1900 281ms Grow 2000-2900 488ms Grow 3000-3900 716ms Grow 4000-4900 898ms Grow 5000-5900 1085ms Grow 6000-6900 1294ms Grow 7000-7900 1470ms Grow 8000-8900 1652ms Grow 9000-9900 1916ms Grow 1-10900 2099ms Sum 12007ms ppcx64 -gv m.pas Free Pascal Compiler version 2.7.1 [2014/02/17] for x86_64 Target OS: Linux for x86-64 Grow 0-900 0ms Grow 1000-1900 0ms Grow 2000-2900 1ms Grow 3000-3900 0ms Grow 4000-4900 0ms Grow 5000-5900 0ms Grow 6000-6900 2ms Grow 7000-7900 3ms Grow 8000-8900 2ms Grow 9000-9900 0ms Grow 1-10900 0ms Sum 10ms ppc386 m.pas Free Pascal Compiler version 2.7.1 [2013/06/28] for i386 Target OS: Linux for i386 Grow 0-900 86ms Grow 1000-1900 247ms Grow 2000-2900 417ms Grow 3000-3900 595ms Grow 4000-4900 781ms Grow 5000-5900 964ms Grow 6000-6900 1128ms Grow 7000-7900 1288ms Grow 8000-8900 1438ms Grow 9000-9900 1612ms Grow 1-10900 1767ms Sum 10341ms ppc386 -gv m.pas Free Pascal Compiler version 2.7.1 [2013/06/28] for i386 Target OS: Linux for i386 Grow 0-900 0ms Grow 1000-1900 0ms Grow 2000-2900 0ms Grow 3000-3900 0ms Grow 4000-4900 0ms Grow 5000-5900 0ms Grow 6000-6900 1ms Grow 7000-7900 0ms Grow 8000-8900 0ms Grow 9000-9900 0ms Grow 1-10900 0ms Sum 1ms dcc m.pas Borland Delphi for Linux Version 14.5 Grow 0-900 0ms Grow 1000-1900 0ms Grow 2000-2900 0ms Grow 3000-3900 0ms Grow 4000-4900 0ms Grow 5000-5900 0ms Grow 6000-6900 0ms Grow 7000-7900 0ms Grow 8000-8900 0ms Grow 9000-9900 0ms Grow 1-10900 0ms Sum 2ms fpc m.pas Free Pascal Compiler version 2.7.1 [2013/12/27] for i386 Target OS: Win32 for i386 Grow 0-900 47ms Grow 1000-1900 157ms Grow 2000-2900 359ms Grow 3000-3900 531ms Grow 4000-4900 656ms Grow 5000-5900 797ms Grow 6000-6900 985ms Grow 7000-7900 1109ms Grow 8000-8900 1250ms Grow 9000-9900 1406ms Grow 1-10900 1532ms Sum 8829ms dcc m.pas Borland Delphi Version 15.0 Grow 0-900 16ms Grow 1000-1900 31ms Grow 2000-2900 47ms Grow 3000-3900 47ms Grow 4000-4900 109ms Grow 5000-5900 63ms Grow 6000-6900 62ms Grow 7000-7900 250ms Grow 8000-8900 266ms Grow 9000-9900 94ms Grow 1-10900 110ms Sum 1125ms (Windows is virtual machine) Is
Re: [fpc-devel] Need heap manager -gv explanation
29.04.2014 13:00, Jonas Maebe пишет: On 29/04/14 08:45, Sergei Gorelkin wrote: -gv switch in command line disables the optimized i386 Move procedure (and that's basically the only thing it does), so it indeed should cause slowdown. Comments say that valgrind (some pretty old version of it) is unable to handle the optimizied Move code. In the meantime, valgrind was presumably fixed. At least since my involvement with FPC back in 2005 I was able to use valgrind to profile programs without any trouble, and without recompiling them with -gv. So maybe it's time reconsider the action of -gv switch, or to remove it altogether. As Tomas mentioned, -gv also causes the use of the C memory manager. This is required because while Valgrind also recognises mmap (as used by our memory manager), the result is not fine grained enough to be of any use: Valgrind can't magically determine that our heap manager divides the mmap'ed blocks into smaller allocations. Regarding the SSE support in Valgrind: its assembler and disassembler have supported those instructions since a long time, but memcheck didn't. In particular, it didn't support memory initialisations using the movaps instruction. And at least in 2010, it still didn't: https://code.google.com/p/nativeclient/issues/detail?id=2251 Maybe it's finally fixed in the latest Valgrind release (from October last year), as the changelog lists various SSE fixes for optimized string copy routines (search for SSE in https://lwn.net/Articles/572790/ ) Thanks for the explanation. I was mostly interested in performance data, using callgrind and cachegrind tools, not memcheck. Also I did not compile with SSE floating-point options. That explains why I didn't encounter the mentioned issues with cmem/SSE support. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Tue, April 29, 2014 10:30, Petr Kristan wrote: . . I use inteligent block increasing. I can optimize program, but why is fpc heap manager to slow? Here is the sample stress program compilable with fpc, delphi and kylix: . . And here are results: . . Is possible to speedup heap manager? Well, results of your test program on my machine (physical machine, MS Win 7 32-bit) show something different: Grow 0-900 31ms Grow 1000-1900 109ms Grow 2000-2900 172ms Grow 3000-3900 250ms Grow 4000-4900 327ms Grow 5000-5900 406ms Grow 6000-6900 483ms Grow 7000-7900 546ms Grow 8000-8900 625ms Grow 9000-9900 702ms Grow 1-10900 764ms Sum 4431ms (compiled without cmem / without valgrind) Grow 0-900 31ms Grow 1000-1900 125ms Grow 2000-2900 203ms Grow 3000-3900 265ms Grow 4000-4900 375ms Grow 5000-5900 436ms Grow 6000-6900 531ms Grow 7000-7900 624ms Grow 8000-8900 733ms Grow 9000-9900 796ms Grow 1-10900 873ms Sum 5008ms (compiled with -gv; as expected, adding CMem to the uses clause and compiling without -gv gives basically the same result). Tests performed with trunk compiler based on SVN from about 10 days ago. Results for 2.6.4 are more or less the same. I don't know the reason of your difference, but no time necessary at all (0 ms) for the valgrind variant looks very suspicious to me. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Tue, Apr 29, 2014 at 11:49:20AM +0200, Tomas Hajny wrote: On Tue, April 29, 2014 10:30, Petr Kristan wrote: . . I use inteligent block increasing. I can optimize program, but why is fpc heap manager to slow? Here is the sample stress program compilable with fpc, delphi and kylix: . . And here are results: . . Is possible to speedup heap manager? Well, results of your test program on my machine (physical machine, MS Win 7 32-bit) show something different: Grow 0-900 31ms Grow 1000-1900 109ms Grow 2000-2900 172ms Grow 3000-3900 250ms Grow 4000-4900 327ms Grow 5000-5900 406ms Grow 6000-6900 483ms Grow 7000-7900 546ms Grow 8000-8900 625ms Grow 9000-9900 702ms Grow 1-10900 764ms Sum 4431ms (compiled without cmem / without valgrind) Grow 0-900 31ms Grow 1000-1900 125ms Grow 2000-2900 203ms Grow 3000-3900 265ms Grow 4000-4900 375ms Grow 5000-5900 436ms Grow 6000-6900 531ms Grow 7000-7900 624ms Grow 8000-8900 733ms Grow 9000-9900 796ms Grow 1-10900 873ms Sum 5008ms (compiled with -gv; as expected, adding CMem to the uses clause and compiling without -gv gives basically the same result). Are you sure that -gv in windows has any effect? I think, that valgrind can be used only in unix systems. Isn't -gv option silently ignored in windows? Tests performed with trunk compiler based on SVN from about 10 days ago. Results for 2.6.4 are more or less the same. Approve, I tested in 2.6. too. I don't know the reason of your difference, but no time necessary at all (0 ms) for the valgrind variant looks very suspicious to me. But compiling by kylix compiler, i get the same results as by fpc with -gv option in linux. This is the reason why I start to hunt where is my program to slow if compiled by fpc against kylix. Petr -- Petr Kristan . EPOS PRO s.r.o., Smilova 333, 530 02 Pardubice tel: +420 461101401Czech Republic (Eastern Europe) fax: +420 461101481 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Tue, Apr 29, 2014 at 11:41:59AM +0200, Mattias Gaertner wrote: On Tue, 29 Apr 2014 10:30:43 +0200 Petr Kristan petr.kris...@epos.cz wrote: [...] Check if you are increasing buffers in constant steps. Change the increment to exponentially. I use inteligent block increasing. I can optimize program, but why is fpc heap manager to slow? [...] const base = 100; begin sum := GetTickCount; for i := 0 to 10 do begin ms := GetTickCount; for j := 1 to 9 do begin ReAllocMem(p1, base*(i*10+j)); ReAllocMem(p2, base*(i*10+j)); end; Writeln(Format('Grow %d-%d %dms', [base*i*10, base*(i*10+9), GetTickCount-ms])); end; Reallocmen checks if there is enough free mem behind. If not it allocates a new mem and copies the content. The fpc heap manager allocates new mem behind the already allocated mem. Running two Reallocmem have almost never enough free mem behind and they have to copy often. cmem leaves more space behind the blocks, so that calling Reallocmen with small increases needs less copies. AFAIK the cmem algorithm depends on OS. Is possible to tune this space behind the blocks? Petr -- Petr Kristan . EPOS PRO s.r.o., Smilova 333, 530 02 Pardubice tel: +420 461101401Czech Republic (Eastern Europe) fax: +420 461101481 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Tue, April 29, 2014 12:12, Petr Kristan wrote: On Tue, Apr 29, 2014 at 11:49:20AM +0200, Tomas Hajny wrote: On Tue, April 29, 2014 10:30, Petr Kristan wrote: . . I use inteligent block increasing. I can optimize program, but why is fpc heap manager to slow? . . Well, results of your test program on my machine (physical machine, MS Win 7 32-bit) show something different: . . Sum 4431ms (compiled without cmem / without valgrind) . . Sum 5008ms (compiled with -gv; as expected, adding CMem to the uses clause and compiling without -gv gives basically the same result). Are you sure that -gv in windows has any effect? I think, that valgrind can be used only in unix systems. Isn't -gv option silently ignored in windows? Use of CMem instead of the internal FPC heap manager is triggered for -gv regardless of the target platform. In addition, the results for source compiled with CMem added explicitly to the uses clause (without using -gv) are also the same. Tests performed with trunk compiler based on SVN from about 10 days ago. Results for 2.6.4 are more or less the same. Approve, I tested in 2.6. too. I don't know the reason of your difference, but no time necessary at all (0 ms) for the valgrind variant looks very suspicious to me. But compiling by kylix compiler, i get the same results as by fpc with -gv option in linux. This is the reason why I start to hunt where is my program to slow if compiled by fpc against kylix. Can't it be somehow related to the method used for measuring the time under Linux? Is the result shown inside consistent to the overall time necessary for the program run? Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On 29.04.2014 14:37, Tomas Hajny wrote: I don't know the reason of your difference, but no time necessary at all (0 ms) for the valgrind variant looks very suspicious to me. But compiling by kylix compiler, i get the same results as by fpc with -gv option in linux. This is the reason why I start to hunt where is my program to slow if compiled by fpc against kylix. Can't it be somehow related to the method used for measuring the time under Linux? Is the result shown inside consistent to the overall time necessary for the program run? Time measurement appears to be correct. Strace shows that reallocation happens using mremap syscalls, which apparently rearranges pages within address space without actual moving the data. This indeed can be done with almost zero overhead, but is hardly portable. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation [tests]
On Tue, Apr 29, 2014 at 03:02:44PM +0400, Sergei Gorelkin wrote: On 29.04.2014 14:37, Tomas Hajny wrote: I don't know the reason of your difference, but no time necessary at all (0 ms) for the valgrind variant looks very suspicious to me. But compiling by kylix compiler, i get the same results as by fpc with -gv option in linux. This is the reason why I start to hunt where is my program to slow if compiled by fpc against kylix. Can't it be somehow related to the method used for measuring the time under Linux? Is the result shown inside consistent to the overall time necessary for the program run? Time measurement appears to be correct. Strace shows that reallocation happens using mremap syscalls, which apparently rearranges pages within address space without actual moving the data. This indeed can be done with almost zero overhead, but is hardly portable. Thanks for the fast and perfect explanation. Resolution for me is that heap manager cannot be multiplatformly improved. I do some optimatization in my code. Petr -- Petr Kristan . EPOS PRO s.r.o., Smilova 333, 530 02 Pardubice tel: +420 461101401Czech Republic (Eastern Europe) fax: +420 461101481 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Need heap manager -gv explanation
Hi I have some application with huge usage ReAllocMem and I found the big performance difference if application is compiled with -gv option (cca 20x faster) then without -gv option. I suspect fpc heap manager. Is possible to tune fpc heap manager? Is some difference in heap manager if application is comiled with -gv or without -gv option? Thanks Petr -- Petr Kristan . EPOS PRO s.r.o., Smilova 333, 530 02 Pardubice tel: +420 461101401Czech Republic (Eastern Europe) fax: +420 461101481 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation
On Mon, 28 Apr 2014 17:20:17 +0200 Petr Kristan petr.kris...@epos.cz wrote: Hi I have some application with huge usage ReAllocMem and I found the big performance difference if application is compiled with -gv option (cca 20x faster) then without -gv option. -gv generates code for valgrind. It should be slower with -gv. I suspect fpc heap manager. Is possible to tune fpc heap manager? Is some difference in heap manager if application is comiled with -gv or without -gv option? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation
On Mon, April 28, 2014 17:56, Mattias Gaertner wrote: On Mon, 28 Apr 2014 17:20:17 +0200 Petr Kristan petr.kris...@epos.cz wrote: Hi I have some application with huge usage ReAllocMem and I found the big performance difference if application is compiled with -gv option (cca 20x faster) then without -gv option. -gv generates code for valgrind. It should be slower with -gv. I suspect fpc heap manager. Is possible to tune fpc heap manager? Is some difference in heap manager if application is comiled with -gv or without -gv option? Use of valgrind requires/triggers use of cmem. Depending on the particular use case (and potentially also the target platform), cmem may indeed be faster. Others would be better positioned for more detailed comparison among various heap managers with regard to speed in different use cases, overall memory requirements achieved by reuse of previously allocated memory, etc. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation
On Mon, Apr 28, 2014 at 06:12:18PM +0200, Tomas Hajny wrote: On Mon, April 28, 2014 17:56, Mattias Gaertner wrote: On Mon, 28 Apr 2014 17:20:17 +0200 Petr Kristan petr.kris...@epos.cz wrote: Hi I have some application with huge usage ReAllocMem and I found the big performance difference if application is compiled with -gv option (cca 20x faster) then without -gv option. -gv generates code for valgrind. It should be slower with -gv. I suspect fpc heap manager. Is possible to tune fpc heap manager? Is some difference in heap manager if application is comiled with -gv or without -gv option? Use of valgrind requires/triggers use of cmem. Depending on the particular use case (and potentially also the target platform), cmem may indeed be faster. Platform is x86_64 Linux. Others would be better positioned for more detailed comparison among various heap managers with regard to speed in different use cases, overall memory requirements achieved by reuse of previously allocated memory, etc. Reuse of previously allocated memory - it really can be my problem. Here is about 200x call ReAllocMem increasing buffer from 4kB to 80MB. It looks like as buffer is increasing ReAllocMem is slowing. But I must verify this feeling. Petr -- Petr Kristan . EPOS PRO s.r.o., Smilova 333, 530 02 Pardubice tel: +420 461101401Czech Republic (Eastern Europe) fax: +420 461101481 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Need heap manager -gv explanation
On Mon, 28 Apr 2014 21:14:14 +0200 Petr Kristan petr.kris...@epos.cz wrote: [...] Others would be better positioned for more detailed comparison among various heap managers with regard to speed in different use cases, overall memory requirements achieved by reuse of previously allocated memory, etc. Reuse of previously allocated memory - it really can be my problem. Here is about 200x call ReAllocMem increasing buffer from 4kB to 80MB. Check if you are increasing buffers in constant steps. Change the increment to exponentially. It looks like as buffer is increasing ReAllocMem is slowing. But I must verify this feeling. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel