Florian Klaempfl wrote on Fri, 16 Jul 2010:

One of the bottlenecks the common user encounters, is unit loading:
especially projects like the lazarus suffer from the time spent into
unit loading while I suspect that it narrows down also to procedures
like fillchar which consume a lot of time.

The main slowdown when recompiling projects is that FPC often recompiles or re-resolves the same unit multiple times when a unit in its uses clause has changed. The ppu loading itself is quite fast. Recompiling Lazarus without changing any unit just takes 2.2 seconds on my machine (without assembling/linking). Compiling program using all units from the packages dir (910 units) takes 4.4 seconds (without assembling/linking).

The following result is from compiling a program that uses 348 (precompiled) units from the packages tree on darwin/x86-64 and lists all functions taking up 1% or more of the total execution time (sample-based). I didn't use all units here because then my laptop does not keep all ppu files in the disk cache during the profiling and that obviously skews the results.

7.6%    mach_kernel     vm_map_enter
4.0%    ppcx48  FPC_MOVE
3.9%    ppcx48  CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
3.6%    mach_kernel     blkclr
3.1%    mach_kernel     vm_map_lookup_entry
2.4%    ppcx48  SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
1.9%    ppcx48  SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
1.8%    mach_kernel     ml_set_interrupts_enabled
1.7%    ppcx48  SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.7%    mach_kernel     lo_alltraps
1.6%    ppcx48  FPC_ANSISTR_DECR_REF
1.5%    libSystem.B.dylib       __bzero
1.4%    ppcx48  SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.4%    ppcx48  fpc_pushexceptaddr
1.2%    ppcx48  SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.1%    ppcx48  CCLASSES_TDYNAMICARRAY_$__READ$formal$LONGWORD$$LONGWORD
1.1%    ppcx48  SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.1%    mach_kernel     pmap_enter
1.1%    ppcx48  fpc_popaddrstack
1.1%    ppcx48  SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.1%    mach_kernel     pmap_remove_range
1.0%    mach_kernel     cache_lookup_path

vmmap_enter is from mmap. This can be improved by increasing the blocksize used to initialise pools for small blocks from 32Kb to 256Kb (to support this for 32 bit systems, fixedoffsetshift in rtl/inc/heap.inc has to be changed from 16 to 12, which is no problem since only the 4 lowest bits are currently used for flags).

5.1% ppcx49 FPC_MOVE // source: 1.3% fpc_shortstr_to_shortstr, 1.1% ppufile.readdata, 0.5% fpc_ansistr_copy 3.7% mach_kernel blkclr // kernel zeroing pages when we mmap memory and it has no reserve zeroed pages
3.6%    ppcx49  SYSTEM_SYSGETMEM_FIXED$QWORD$$POINTER
3.5%    ppcx49  CCLASSES_FPHASH$SHORTSTRING$$LONGWORD
2.2%    ppcx49  SYSTEM_SYSFREEMEM_FIXED$PFREELISTS$PMEMCHUNK_FIXED$$QWORD
2.1%    libSystem.B.dylib       __bzero  // fillchar(0)
2.0%    ppcx49  SYSTEM_REMOVE_FREED_FIXED_CHUNKS$POSCHUNK
1.9%    ppcx49  SYSTEM_ALLOC_OSCHUNK$PFREELISTS$QWORD$QWORD$$POINTER
1.8%    ppcx49  FPC_ANSISTR_DECR_REF
1.8%    ppcx49  SYSTEM_SYSFREEMEM$POINTER$$QWORD
1.7%    mach_kernel     lo_alltraps
1.6%    mach_kernel     ml_set_interrupts_enabled
1.4%    ppcx49  SYMTYPE_TDEREF_$__RESOLVE$$TOBJECT
1.4%    mach_kernel     pmap_enter // page fault
1.4%    ppcx49  fpc_pushexceptaddr
1.4%    ppcx49  SYSUTILS_COMPARETEXT$ANSISTRING$ANSISTRING$$LONGINT
1.2%    mach_kernel     pmap_remove_range // munmap
1.1%    ppcx49  PPU_TPPUFILE_$__GETBYTE$$BYTE
1.1%    ppcx49  
CCLASSES_TFPHASHLIST_$__INTERNALFIND$LONGWORD$SHORTSTRING$LONGINT$$LONGINT
1.1%    mach_kernel     vm_page_lookup // page fault
1.1%    ppcx49  SYSTEM_SETJMP$JMP_BUF$$LONGINT
1.0%    ppcx49  FPC_MOVE
1.0%    mach_kernel     vm_map_enter // mmap
1.0%    ppcx49  SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
1.0%    ppcx49  SYSTEM_SYSGETMEM_VAR$QWORD$$POINTER
1.0%    ppcx49  fpc_varset_add_sets
1.0%    ppcx49  FPC_SHORTSTR_COMPARE_EQUAL
1.0%    ppcx49  fpc_ansistr_setlength

In real time (without assembling/linking):

Before               After
user    0m1.621s     user       0m1.636s
sys     0m0.791s     sys        0m0.492s

Total memory usage barely changes (from 297MB to 299MB). I guess it's no problem to commit this, but in most cases it probably won't change much if anything performance-wise unless you do almost nothing but allocate tons of small memory blocks without every freeing any in between.


Jonas

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to