Re: [fpc-devel] Messages overhead
Hans-Peter Diettrich schrieb: Daniël Mantione schrieb: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? Common knowledge, proved by experience. Please profile the compiler first, people did this already. The message ID has to be re-encoded in a way, that allows to determine e.g. the message level immediately. All this currently is done only after text substitution, by inspection of the returned string. This cannot be done because people should have the chance to use custom error files to override message verbosities. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Jonas Maebe schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. Include files, macro expansion and generics make such optimizations hard. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: Marco van de Voort schrieb: I don't think we ever going to give an up front carte blanche for a massive rewrite to go into trunk. That is simply not sane. ACK. I'm more concerned about work that is blacklisted for some reason. Rewriting the compiler in C ;) Discuss the change on fpc-devel first, if you want to improve performance, generate numbers for at least: make cycle and lazarus building. A subsmission will always be judged on performance and maintainability before being admitted. If this bothers you, try to find smart ways to phase the changes, and limit yourself to a few things at a time, and don't try to speedoptimize I/O, change parser, allow multiple frontends etc, all at the same time. Just this is hard to obey, when I see so many details that could be improved. Will it do harm when I create more than one branch, e.g. one for general optimizations? No, but try to finish one thing and then start the next one. With increased time, patches might not apply automatically anymore. Can other people contribute to such an branch as well? Yes, just tell me to which branch and what login (I'll try to be faster this time ;)). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
On Thu, 15 Jul 2010, Florian Klaempfl wrote: Jonas Maebe schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. But if a memory block is freed and put back on the heap, you need to zero it again, so where is the gain in that ? Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Michael Van Canneyt schrieb: On Thu, 15 Jul 2010, Florian Klaempfl wrote: Jonas Maebe schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. But if a memory block is freed and put back on the heap, you need to zero it again, so where is the gain in that ? While it's in the freelist, the helper task zeros it so when it is allocated again, it needs no zeroing again. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Florian Klaempfl schrieb: Michael Van Canneyt schrieb: On Thu, 15 Jul 2010, Florian Klaempfl wrote: Jonas Maebe schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. But if a memory block is freed and put back on the heap, you need to zero it again, so where is the gain in that ? While it's in the freelist, the helper task ... thread ... zeros it so when it is allocated again, it needs no zeroing again. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
In our previous episode, Florian Klaempfl said: zero it again, so where is the gain in that ? While it's in the freelist, the helper task ... thread ... zeros it so when it is allocated again, it needs no zeroing again. But that then zeroes every deallocation. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Purpose of uses ... in?
On 07/14/2010 09:40 PM, Sven Barth wrote: (Can someone comment on FreeDOS regarding this?) If it's file system provides ;MS-DOS-style long filenames, users of same might be sued by M$ (like TomTom). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Purpose of uses ... in?
On 07/14/2010 11:35 PM, Stefan Kisdaroczi wrote: Or make a symlink: Happily even Windows NTFS does support symlinks, even though hardly anybody uses this. ;) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Florian Klaempfl wrote on Thu, 15 Jul 2010: I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. Possibly, yes. At least most OSes also zero pages in a background thread for exactly the same reasons. I've also tried adding simple pooling systems to reduce the allocation/freeing time (a bit like mark/release), but the problem is that in many cases class instances can be either local or global (e.g., tai* can be added to some global stubs section or to a procedure's code, and parse tree nodes can become part of saved trees for inlining). We could also parallelise writing out the assembler code for the external assembler, and possibly also some other list processing. E.g., for N threads, start by simply walking over the list of instructions and store a pointer to the first and then every (list.count/N)th element (rounded up or down to the start of a new source line in case of the assembler writer). Then fire off the N threads to start processing the list at those points and let them store their output in temporary buffers. At the end, write the buffers out in the correct order. There are currently some global dependencies (e.g. Darwin DWARF label numbers that are generated on the fly, and the current section type is kept track of for optimisation reasons), but it shouldn't be very difficult to resolve them. The same technique can probably also be used to parallelise at least parts of the internal assembler. Especially when using DWARF, which causes a lot of tai constants to be generated, this could make a significant difference. And since the lists keep track of the number of elements, we can easily define a threshold used to decide whether to parallelise and if so, at most how many threads should be used. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Marco van de Voort wrote on Thu, 15 Jul 2010: But that then zeroes every deallocation. You would only do this for class instances that are kept in a pool managed by overriding newinstance/freeinstance. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Hans-Peter Diettrich wrote on Thu, 15 Jul 2010: Jonas Maebe schrieb: [snip] When the file resides in the OS file cache, no page faults will occur unless the file was removed from the cache. If not, every access request has to do disk I/O, resulting in process switching etc., so that the page faults are neglectable. That is only true if you get at most as many page faults as you perform read system calls. Then just read it into a buffer in one shot. That's just what I suggested, for a first test :-) Just increasing the buffer size by itself does not have any noticeable effects on the compilation speed (I benchmarked it in the past). a) the memory management overhead primarily comes from allocating and freeing machine instruction (and to a lesser extent node tree) instances b) the string copy cost I mentioned primarily comes from getting symbol names for the purpose of generating rtti and assembler symbol names May be, we'll see... Those facts come from profiling runs. That does not mean that nothing can be optimised in the scanner/parser, but the bottleneck routines I posted are unrelated. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/14/2010 05:21 PM, Jonas Maebe wrote: a) the memory management overhead primarily comes from allocating and freeing machine instruction (and to a lesser extent node tree) instances Did somebody take a look at FastMM for Delphi ? ( http://sourceforge.net/projects/fastmm/ ) Same seems to use a nice paradigm doing the Memory management for threaded applications. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
On 14 Jul 2010, at 17:35, Jonas Maebe wrote: Geoffrey Barton wrote on Wed, 14 Jul 2010: the resulting constant disassembles as:- 1bc:01a5.word 0x01a5 which seems to be one greater than the address of the procedure. Is this right? Yes. If so, why? To identify the code as Thumb code. I was aware of the requirement to add one to the address of an ISR, but not to other calls. A normal procedural call seems to assemble as a 'BL' to an even address? I must be missing something:-) also, why does the compiler sometimes add a 'nop' to the end of a procedure (as above)? The default alignment of routines is 4 bytes. And it's most likely the assembler or linker that adds the nop, rather than the compiler. Yes, right. Probably the linker. Presumably because it is slightly faster. thanks, Geoffrey Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell wrote on Thu, 15 Jul 2010: Did somebody take a look at FastMM for Delphi ? ( http://sourceforge.net/projects/fastmm/ ) Same seems to use a nice paradigm doing the Memory management for threaded applications. Then please explain that paradigm, since apparently you already looked at it. In return, I will explain the FPC heap manager's paradigm: per thread, there is a separate heap manager so that in most cases no synchronisation is required. Only if memory is allocated in one thread and freed in another, then it will be added to a global locked structure. When a thread runs out of memory in its pools, it will first check this global (synchronised) structure before asking for more memory from the OS. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
Geoffrey Barton wrote on Thu, 15 Jul 2010: On 14 Jul 2010, at 17:35, Jonas Maebe wrote: To identify the code as Thumb code. I was aware of the requirement to add one to the address of an ISR, but not to other calls. The address of the symbol is set to an odd value so that the linker can identify the subsequent code as Thumb code. A normal procedural call seems to assemble as a 'BL' to an even address? I must be missing something:-) BL always goes from ARM to ARM or from Thumb to Thumb code. Therefore, the hardware does not need the extra bit in the address since the instruction encodes what kind of code is at the destination. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/15/2010 11:14 AM, Jonas Maebe wrote: Then please explain that paradigm, since apparently you already looked at it. AFAIU, they use three areas for small midrange and large chunks Small chunks are allocated in several lists, each of which hosts equally sized chunks, thus finding the chunks is a one-step access (no linked lists). Thus no unification of chunks when freeing is necessary. With a normal program some 98 % of the chunks are small. Midrange Chunks are allocated in a single list of non equally sized chunks Large chunks are allocated by a direct OS API call. Only Midrange chunks implement atomic management for thread-safeness. For large chunks, the OS does this anyway, If a conflicting access is detected with small chunks, the second thread simply uses a midrange chunk in that rare occasion. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
On 07/15/2010 11:12 AM, Geoffrey Barton wrote: A normal procedural call seems to assemble as a 'BL' to an even address? I doubt that it's possible to call thumb code from ARM code with a BL. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
I think it is on Thumb2 devices. To my understanding, they did so on purpose so you no longer have to use interwork intermidiate functions to switch between arm and thumb mode Michael Schnell skrev: On 07/15/2010 11:12 AM, Geoffrey Barton wrote: A normal procedural call seems to assemble as a 'BL' to an even address? I doubt that it's possible to call thumb code from ARM code with a BL. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Zitat von Jonas Maebe jonas.ma...@elis.ugent.be: Michael Schnell wrote on Thu, 15 Jul 2010: Did somebody take a look at FastMM for Delphi ? ( http://sourceforge.net/projects/fastmm/ ) Same seems to use a nice paradigm doing the Memory management for threaded applications. Then please explain that paradigm, since apparently you already looked at it. In return, I will explain the FPC heap manager's paradigm: per thread, there is a separate heap manager so that in most cases no synchronisation is required. Only if memory is allocated in one thread and freed in another, then it will be added to a global locked structure. When a thread runs out of memory in its pools, it will first check this global (synchronised) structure before asking for more memory from the OS. Does that mean, if I let a worker thread create strings, pass them to the main thread, free the worker thread and unreference the strings in the main thread the global structure will grow and grow? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Mattias Gärtner schrieb: Zitat von Jonas Maebe jonas.ma...@elis.ugent.be: Michael Schnell wrote on Thu, 15 Jul 2010: Did somebody take a look at FastMM for Delphi ? ( http://sourceforge.net/projects/fastmm/ ) Same seems to use a nice paradigm doing the Memory management for threaded applications. Then please explain that paradigm, since apparently you already looked at it. In return, I will explain the FPC heap manager's paradigm: per thread, there is a separate heap manager so that in most cases no synchronisation is required. Only if memory is allocated in one thread and freed in another, then it will be added to a global locked structure. When a thread runs out of memory in its pools, it will first check this global (synchronised) structure before asking for more memory from the OS. Does that mean, if I let a worker thread create strings, pass them to the main thread, free the worker thread and unreference the strings in the main thread the global structure will grow and grow? No, because the worker thread looks into the global structure when it runs out of local space. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
On 07/15/2010 11:33 AM, Jeppe Johansen wrote: I think it is on Thumb2 devices. To my understanding, they did so on purpose so you no longer have to use interwork intermidiate functions to switch between arm and thumb mode That seems to be not BL but BX: http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001l/QRC0001_UAL.pdf -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/15/2010 11:43 AM, Florian Klaempfl wrote: No, because the worker thread looks into the global structure when it runs out of local space. This kind of garbage control might add some overhead at unforeseen times. I seem to like the FASTMM paradigm better. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
... And of course when using a memory cell to store a procedure address, BX is used to call the procedure. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] arm embedded cortexm3 procedure address
Sorry: BLX. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell schrieb: On 07/15/2010 11:43 AM, Florian Klaempfl wrote: No, because the worker thread looks into the global structure when it runs out of local space. This kind of garbage control might add some overhead at unforeseen times. You don't turn off caches in SMP systems either because sometimes it might need synchronization: in 99.9% of the use cases using caches is an advantage. I seem to like This has nothing to do with liking but simply with profiling your app and find out which memory manager fits best to the application. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell wrote on Thu, 15 Jul 2010: On 07/15/2010 11:43 AM, Florian Klaempfl wrote: No, because the worker thread looks into the global structure when it runs out of local space. This kind of garbage control might add some overhead at unforeseen times. And you will get overhead in the FASTMM scheme if you have two threads that are concurrently allocating and/or freeing a lot of small or medium blocks. I seem to like the FASTMM paradigm better. Both have their strong and weak points. Note that FPC's memory manager also keeps separate freelists for small blocks (but per thread). Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Purpose of uses ... in?
Hi! Am 15.07.2010 09:49, schrieb Michael Schnell: On 07/14/2010 09:40 PM, Sven Barth wrote: (Can someone comment on FreeDOS regarding this?) If it's file system provides ;MS-DOS-style long filenames, users of same might be sued by M$ (like TomTom). But why should they sue an open source MS-DOS clone when they're trying to get their costumers to upgrade to Windows 7? I'd more understand if they'd sue ReactOS in the near future... or Linux, because it is the root of all evil. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Purpose of uses ... in?
But why should they sue an open source MS-DOS clone [snip] Please take that discussion to the fpc-other list. Jonas FPC mailing lists admin This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell schrieb: On 07/15/2010 12:05 PM, Jonas Maebe wrote: And you will get overhead in the FASTMM scheme if you have two threads that are concurrently allocating and/or freeing a lot of small or medium blocks. Only in extremely rare cases (which of course will be handled but seem not to be relevant regarding the overall speed). Did you profiling? We did ... With small blocks there is no concurrency (as in case of conflict, the second thread will use medium. And the third and fourth thread? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
In our previous episode, Michael Schnell said: With small blocks there is no concurrency (as in case of conflict, the second thread will use medium. I'm no memory manager expert, but reading this raises some question: How is this conflict detected? If this is a kind of lock, (that needs to be SMP safe I guess) the FPC manager can probably skip that in most small allocations, and only has to do this if it really touches global structures? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Op 2010-07-14 17:17, Daniël Mantione het geskryf: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? A few weeks ago, I benchmarked that too (well similar). I did a comparison between Kylix 3 and FPC 2.4.x. When I disabled all compiler message output, FPC compiled the test apps a bit faster that with -va parameter. It wasn't an exhaustive test by any means, but there was a difference. So I would imagine Hans has a point, that removing any calls to message output would reduce compilation even more. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Op 2010-07-15 06:26, Sergei Gorelkin het geskryf: Message processing indeed includes some overhead, but this is not #1 bottleneck to worry about. Maybe #10 or so. Well, any optimization is better than none. Irrespective of where you rank it, one has to start somewhere. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Graeme Geldenhuys wrote on Thu, 15 Jul 2010: Well, any optimization is better than none. That's only true if it does not impact clarity or maintainability. Otherwise there is a tradeoff to be made in favour of one or the other. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Op 2010-07-15 14:18, Jonas Maebe het geskryf: That's only true if it does not impact clarity or maintainability. Otherwise there is a tradeoff to be made in favour of one or the other. True, but I don't think we are talking about optimization like changing Object Pascal code to ASM. The original poster (as far as I can tell) simply meant that checking for verbosity or whatever before generating message output. So message calls are only generated/used when really needed. This seems like a simple enough optimization without impacting clarity. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
On Thu, 15 Jul 2010, Graeme Geldenhuys wrote: Op 2010-07-14 17:17, Daniël Mantione het geskryf: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? A few weeks ago, I benchmarked that too (well similar). I did a comparison between Kylix 3 and FPC 2.4.x. When I disabled all compiler message output, FPC compiled the test apps a bit faster that with -va parameter. It wasn't an exhaustive test by any means, but there was a difference. So I would imagine Hans has a point, that removing any calls to message output would reduce compilation even more. I think that it is more the actual syscalls to write info which make the difference than the preparation of the syscall. If you wrote output to the terminal, then that is an additional slowing-down factor. The only way to test the impact of message preparation is to uncomment the actual write call, compile with -va and then compile again (clean) with -v0. Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
On Thu, 15 Jul 2010, Michael Van Canneyt wrote: On Thu, 15 Jul 2010, Graeme Geldenhuys wrote: Op 2010-07-14 17:17, Daniël Mantione het geskryf: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? A few weeks ago, I benchmarked that too (well similar). I did a comparison between Kylix 3 and FPC 2.4.x. When I disabled all compiler message output, FPC compiled the test apps a bit faster that with -va parameter. It wasn't an exhaustive test by any means, but there was a difference. So I would imagine Hans has a point, that removing any calls to message output would reduce compilation even more. I think that it is more the actual syscalls to write info which make the difference than the preparation of the syscall. If you wrote output to the terminal, then that is an additional slowing-down factor. The only way to test the impact of message preparation is to uncomment the That should of course be 'comment out'... Michael. actual write call, compile with -va and then compile again (clean) with -v0. Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Jonas Maebe schrieb: Graeme Geldenhuys wrote on Thu, 15 Jul 2010: Well, any optimization is better than none. And before doing a lot of guess work: it is already known that e.g. fillchar is a bottleneck (probably not only for the compiler). So better mail the fastcode coders if we get permission to integrated the fastcode fillchar routines into fpc. This is not that hard, just look how this is down for move. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell wrote: I did not take a look at FPC's memory manager. Maybe someone might want to do some profiling I did the extensive profiling when working on fcl-xml package. For a single-threaded application, the following is true: - FastMM is somewhat slower than FPC's memory manager, but the difference is small. - Given the amount of source code in FPC and FastMM, FPC is clearly a winner :) - A lot depends on how you deal with memory. It is much faster to allocate only than to allocate and release in turn. - Given the above, a single forgotten 'const' modifier at function argument of type string may ruin your application performance irrespective of memory manager used. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/15/2010 01:14 PM, Florian Klaempfl wrote: And the third and fourth thread? Should not make much difference. The time span that is eligible for a conflict is very short and thus more than two threads at the same time that can't do the critical action in a normal way is extremely unlikely. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/15/2010 01:28 PM, Marco van de Voort wrote: How is this conflict detected? If this is a kind of lock, (that needs to be SMP safe I guess) the FPC manager can probably skip that in most small allocations, and only has to do this if it really touches global structures? This is quite a lot ASM code, so I can't easily answer the question. I suppose you are right assuming that thread/SMP safety needs to be granted somehow. We might want to ask Pierre how he did it -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 07/15/2010 03:36 PM, Sergei Gorelkin wrote: - FastMM is somewhat slower than FPC's memory manager, but the difference is small. Good to know ! - Given the amount of source code in FPC and FastMM, FPC is clearly a winner :) Yep. FastMM uses a lot ASM, so a plus for FPC RTL. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Florian Klaempfl schrieb: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? Common knowledge, proved by experience. Please profile the compiler first, people did this already. This is what I'm going to do now. Does there exist some profiling code already, or do I have to reinvent the wheel? The message ID has to be re-encoded in a way, that allows to determine e.g. the message level immediately. All this currently is done only after text substitution, by inspection of the returned string. This cannot be done because people should have the chance to use custom error files to override message verbosities. Couldn't this be delegated to an custom message handler? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Florian Klaempfl schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. Such initialized memory may reside in the wrong cache, on multi-core systems. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Sergei Gorelkin schrieb: I had benchmarked that, and submitted some patching some (already long) time ago. Those patches inserted calls to CheckVerbosity before blocks of messages of rarely used verbosity, like V_Tried. But the patches have been rejected? For what reason? This way it was possible to get rid of majority of rarely seen messages, while struggling with more common verbosities seems impractical (those are few anyway). That's just what I also had in mind. Message processing indeed includes some overhead, but this is not #1 bottleneck to worry about. Maybe #10 or so. Most processing is done in shortstrings, which is fast. Perhaps I'm too much concentrated on an general parser. In an compiler (and for OPL) the parser overhead may be not so high. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl schrieb: Hans-Peter Diettrich schrieb: I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. Include files, macro expansion and generics make such optimizations hard. Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Marco van de Voort schrieb: In our previous episode, Florian Klaempfl said: zero it again, so where is the gain in that ? While it's in the freelist, the helper task ... thread ... zeros it so when it is allocated again, it needs no zeroing again. But that then zeroes every deallocation. A flag could be used to prevent zeroing at least at program shutdown (or the helper thread could be terminated). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Michael Schnell schrieb: With small blocks there is no concurrency (as in case of conflict, the second thread will use medium. Just an idea: When the lists contain many entries, they could be split into buckets. Then the currently searched bucket(s) could be locked against use by other threads, which can skip them and inspect the next bucket. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Op Thu, 15 Jul 2010, schreef Hans-Peter Diettrich: This is what I'm going to do now. Does there exist some profiling code already, or do I have to reinvent the wheel? No, simply use the -pg compiler option, the use gprof. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
On 2010-07-15 17:46, Michael Schnell wrote: On 07/15/2010 03:36 PM, Sergei Gorelkin wrote: - FastMM is somewhat slower than FPC's memory manager, but the difference is small. Good to know ! - Given the amount of source code in FPC and FastMM, FPC is clearly a winner :) Yep. FastMM uses a lot ASM, so a plus for FPC RTL. I was curious about the differences about FastMM (which I use under Delphi) and TopMM, and asked about it in Delphi ThirdPartyTools NG. FastMM is a lot more helpful WRT debugging user code, yet TopMM seems to have speed advantage for multi-threaded/multi-core application. [I haven't done any benchmarks.] I was particularly interested in how 'Cross Thread performance hit' would affect applications --since TopMM mentions it as significant. Here is the reply I got to that question. [ nntp://forums.codegear.com/embarcadero.public.delphi.thirdpartytools.general/10280 ] Arnaud BOUCHEZ wrote: I could live with the others, but what excatly is 'Cross Thread performance hit'; I mean, when (under what cirsumstances) would I fall for that? The FastMM4 uses a LOCKed asm instruction for every memory allocation or dis-allocation. This LOCK ensure that a memory is modified by only a thread at a time. This is the same LOCKed asm function which is used internally by Windows with its Critical Sections. Windows itself is told not to be very multi-core friendly, because it does use a lot of critical sections in its internal... Linux is much more advanced, and scales pretty well on massive multi-core architectures. On a multi-core CPU, all cores just freeze in order to make this LOCKed asm function threadsafe. If you have a lot of threads with more than one CPU, the context of every CPU core has to be frozen, cleared, all cores wait for the LOCKed asm instruction to complete, then the context is to be retrieved, and execution continue. So a LOCK-free memory manager will improve Cross Thread performance hit a lot. One another big problem with the current implementation of the Delphi compiler is that string types and dynamic arrays just use the same LOCKed asm instruction everywhere. See what I wrote here (this post was not very popular, but indeed I think I've raised a big issue on the Delphi compiler internals and performance here - and I don't think Embarcadero has plans to resolve this): https://forums.codegear.com/thread.jspa?threadID=30826tstart=90 IMHO if you use strings in your application and need speed, using another memory manager than FastMM4 is not enough. You'll have to avoid most string use, and implement a safe TStringBuilder-like class. ShortStrings could be handy here, even if they are limited to 255 character long. In our enhanced RTL for Delphi 7, we avoid use of this LOCKed asm instruction if your application has only one thread: so if you use our enhanced RTL, and make thread by yourself (not using the TThread object), you'll have th e best multi-thread performance possible. From this, I gather TopMM is a GOOD THING; and from Ivo's TopMM site, he seems to have intentions (TODO) to port TopMM to FPC. I wonder if it would be a good idea to ask him to speed up that port. Cheers, Adem ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
On Thu, July 15, 2010 14:06, Graeme Geldenhuys wrote: Op 2010-07-15 06:26, Sergei Gorelkin het geskryf: Message processing indeed includes some overhead, but this is not #1 bottleneck to worry about. Maybe #10 or so. Well, any optimization is better than none. Irrespective of where you rank it, one has to start somewhere. Have I misunderstood something, or is this optimization really just about the fact that message loading is not necessary if the compiler is requested to output no messages at all? Even if this (probably extremely rare!) case happens, the difference must be completely negligible unless you perform very many compilations of simple and short source files without further dependencies (which doesn't sound like a typical use case to me ;-) ), right? Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
On Thu, July 15, 2010 15:57, Hans-Peter Diettrich wrote: Florian Klaempfl schrieb: . . The message ID has to be re-encoded in a way, that allows to determine e.g. the message level immediately. All this currently is done only after text substitution, by inspection of the returned string. This cannot be done because people should have the chance to use custom error files to override message verbosities. Couldn't this be delegated to an custom message handler? Not sure if I understand here. Do you mean that instead of changing few letters in the message file such people would need to write their own message handler and recompile the compiler for using it? Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Hans-Peter Diettrich wrote: But the patches have been rejected? For what reason? They have been applied in revision 9297. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich wrote: Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). You normally shouldn't ever need to process every token this way. Language keywords are encoded with enumeration type. Everything else is put into hashtable, so you typically need only as much string copies as there are distinct identifiers in the file. Besides, shortstring copies are pretty cheap, compared to AnsiStrings. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Hans-Peter Diettrich schrieb: Florian Klaempfl schrieb: IMO compiler messages slow down compilation a lot. How do you know this, did you benchmark or is it just your opinion? Common knowledge, proved by experience. Please profile the compiler first, people did this already. This is what I'm going to do now. Does there exist some profiling code already, or do I have to reinvent the wheel? I usually use callgrind together with kcachegrind (code needs to be compiled with -gv). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: Compiler bottlenecks
Adem schrieb: The FastMM4 uses a LOCKed asm instruction for every memory allocation or dis-allocation. This LOCK ensure that a memory is modified by only a thread at a time. This is the same LOCKed asm function which is used internally by Windows with its Critical Sections. Windows itself is told not to be very multi-core friendly, because it does use a lot of critical sections in its internal... Linux is much more advanced, and scales pretty well on massive multi-core architectures. On a multi-core CPU, all cores just freeze in order to make this LOCKed asm function threadsafe. If you have a lot of threads with more than one CPU, the context of every CPU core has to be frozen, cleared, all cores wait for the LOCKed asm instruction to complete, then the context is to be retrieved, and execution continue. This is not true since at least Pentium Pro times. A lock prefix causes no bus lock anymore but only a cache lock. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Hans-Peter Diettrich schrieb: Florian Klaempfl schrieb: Unless you are doing a cold compile, the main bottlenecks in the compiler are the memory manager (mostly the allocation of memory, freeing is faster), zero-filling new class instances (and partially resetting the register allocator) and tobject.initinstance. I wonder if zeroing memory blocks (so when allocating them we know already that they contain zeros) and preparing new register allocators in a helper thread could improve this. Such initialized memory may reside in the wrong cache, on multi-core systems. Yes, without benchmarking it, we cannot know. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Compiler bottlenecks]
Such initialized memory may reside in the wrong cache, on multi-core systems. I don't know about that. I think I recall reading that multi-core systems share L2 cache memory. http://en.wikipedia.org/wiki/Multi-core_processor I know Delphi used to initialize my data structures when I created them. I had to write custom Init and Done procedures for all my structures since switching to FPC. No to complain b/c it's a lot cleaner this way to keep track of Initialization and Finalization of data associated with each data struct. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Tomas Hajny schrieb: This cannot be done because people should have the chance to use custom error files to override message verbosities. Couldn't this be delegated to an custom message handler? Not sure if I understand here. Do you mean that instead of changing few letters in the message file such people would need to write their own message handler and recompile the compiler for using it? This would be one solution. Another one would interpret (such) message files once, on load, to extract the required information from them. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Sergei Gorelkin schrieb: Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). You normally shouldn't ever need to process every token this way. Language keywords are encoded with enumeration type. Everything else is put into hashtable, so you typically need only as much string copies as there are distinct identifiers in the file. That's okay, in detail when the uniquely cased names have to be stored. Besides, shortstring copies are pretty cheap, compared to AnsiStrings. I'm not sure about possible string operations, that force implicit conversion between Ansi and ShortString. I observed such performance hogs in Delphi and other languages, no experience with FPC and the concrete compiler code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Sergei Gorelkin schrieb: The issue was that, whenever compiler needs to output a message, it: - loads a messagefile (once in a session) - looks up the message by number - performs the parameter substitution (this involves AnsiStrings, so is somewhat heavy) - only then checks if it should really print the resulting string. The most amount of messages come from unit search; the system units are loaded every time, so you always have about several thousands messages loaded and discarded. This was taking a noticeable amount of executed CPU instructions (profiled with Valgrind). This is just what I concluded from a quick glance at the implementation. With the patch applied in r9297, I was able to cut the total number of executed instructions down by 20%, but that gave no increase in perceived speed of compilation. So I decided not to put much more effort to this issue. Good to know :-) I think that it's time to put aside old experience, and replace it by up-to-date performance considerations. And since FPC seems to be optimized and tested very well, even if it doesn't look so at the first glance, I better concentrate on other tasks. Although, modifying the messaging system in a way when CheckVerbosity() is called as soon as possible (before parameter substitution and other processing) could be beneficial. Then at least one can be sure that the mentioned overhead can not occur any more. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
On 15 July 2010 17:31, Tomas Hajny wrote: requested to output no messages at all? Even if this (probably extremely rare!) case happens, the difference must be completely negligible unless you perform very many compilations of simple and short source files Our one project alone has over 250 units/forms with complex unit dependencies. This excludes 2 large external frameworks (also implemented in Object Pascal) that we pull in as well during compilation - we depend on them for certain functionality. I also tend to always build _all_ units, because I have been bitten enough times with source code changes which didn't trigger unit recompiles for some reason, and then waist my time debugging something that doesn't actually need debugging. So yes, anything to improve compiler speed is always welcomed by me. I miss the speed of Kylix or Delphi. And before somebody jumps on me, yes I know FPC has a different design because it supports multiple platforms and architectures - maintainability is more important to the core team that speed. -- Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Messages overhead
Op Thu, 15 Jul 2010, schreef Graeme Geldenhuys: maintainability is more important to the core team that speed. No. That is doesn't do justice to all the effort that is put into performance optimization. It's not about maintainability being more important. It is about making the right trade-offs between: - Compiler speed - Compiler memory usage - Generated code quality - Compiler portability - And indeed compiler maintainability. Compiler speed can loose from maintainability, but it can also loose from code quality; the performance of your application is probably also worth a lot to you. Nevertheless contest the idea that FPC is a slow compiler, I have put a lot of effort in optimizing compiler speed over the years. I work with many compilers daily, including GCC, Pathscale, Intel, Portland Group. FPC wins from all of these compilers by orders of magnitude. Last week I compiled OpenFOAM, a fluid dynamics software written in C++, with the Intel compiler. It took 9 hours for 110 megabytes of source code. FPC compiles such an amount of code in a few minutes... The fact it can do that can be attributed to the Pascal unit system (compared to include headers), but just as much to the choice of smart algorithms and datastrures and a tons of local code optimizations that were coded over many years. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel