Re: [fpc-devel] Blackfin support
On Wed, 2010-07-14 at 19:19 +0200, Marco van de Voort wrote: Core is not unreasonable (*), (*) well except me obviously, but I won't be reviewing compiler submissions, so it is easier for me to say this all. Sorry for the useless post, but this is just funny. ;) Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. Include files, macro expansion and generics make such optimizations hard. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: Marco van de Voort schrieb: I don't think we ever going to give an up front carte blanche for a massive rewrite to go into trunk. That is simply not sane. ACK. I'm more concerned about work that is blacklisted for some reason. Rewriting the compiler in C ;) Discuss the change on fpc-devel first, if you want to improve performance, generate numbers for at least: make cycle and lazarus building. A subsmission will always be judged on performance and maintainability before being admitted. If this bothers you, try to find smart ways to phase the changes, and limit yourself to a few things at a time, and don't try to speedoptimize I/O, change parser, allow multiple frontends etc, all at the same time. Just this is hard to obey, when I see so many details that could be improved. Will it do harm when I create more than one branch, e.g. one for general optimizations? No, but try to finish one thing and then start the next one. With increased time, patches might not apply automatically anymore. Can other people contribute to such an branch as well? Yes, just tell me to which branch and what login (I'll try to be faster this time ;)). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl schrieb: Hans-Peter Diettrich schrieb: I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. Include files, macro expansion and generics make such optimizations hard. Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich wrote: Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). You normally shouldn't ever need to process every token this way. Language keywords are encoded with enumeration type. Everything else is put into hashtable, so you typically need only as much string copies as there are distinct identifiers in the file. Besides, shortstring copies are pretty cheap, compared to AnsiStrings. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Sergei Gorelkin schrieb: Not necessarily. When all currently used files reside in memory, every (recorded) token can contain an pointer (or offset) into the file buffer. This may reduce the number of required string copies (not yet fully researched). You normally shouldn't ever need to process every token this way. Language keywords are encoded with enumeration type. Everything else is put into hashtable, so you typically need only as much string copies as there are distinct identifiers in the file. That's okay, in detail when the uniquely cased names have to be stored. Besides, shortstring copies are pretty cheap, compared to AnsiStrings. I'm not sure about possible string operations, that force implicit conversion between Ansi and ShortString. I observed such performance hogs in Delphi and other languages, no experience with FPC and the concrete compiler code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: One must keep in mind though that he probably measures on a *nix, and there is a reason why on Windows the make cycle takes twice the time on Windows. One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? As said, I always had the feeling that it was binary startup time, and directory I/O rather than basic blockwrite/read speed. Most files are tens of kbs max, and will probably be read entirely anyway by the system. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/13/2010 11:18 PM, Hans-Peter Diettrich wrote: When we rely on an OS file chache, we can read all files entirely into memory, instead of using buffered I/O. Loading the complete file instead of parts of it would do unnecessary memory copies. In fact I suppose using file mapping instead of read (and maybe write) should improve speed in many cases. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/13/2010 05:19 PM, Hans-Peter Diettrich wrote: It may be a good idea to implement different models, that either read entire files... read - map -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote: One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? AFAIK File Mapping is used a lot and very successfully with Linux, but it _is_ available with NTFS. No idea if, here, the implementation is done in a way this it's really fast. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Michael Schnell said: On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote: One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? AFAIK File Mapping is used a lot and very successfully with Linux, but it _is_ available with NTFS. No idea if, here, the implementation is done in a way this it's really fast. I've tried it long ago in win2000, and maybe even XP. If you linearly access the files with a large enough blocksize (8 or 16kb), it was hardly measurable. (+/- 300MB files). Probably if you go linearly, the readahead is already near efficient. But FPC might not adhere to this scheme, I don't know if FPC currently loads the whole file or leaves the file open while it processes e.g. a .inc. If it doesn't load the whole file, opening others triggers head movement (if not in cache) that could be avoided. Mapping does not change that picture (the head still has to move if you access a previously unread block). Mapping mainly is more about - zero-copy access to file content - and uses the VM system to cache _already accessed_ blocks. The compiler does not do enough I/O to make the first worthwhile, the second is irrelevant to the compiler;s access pattern. The only way it could matter if the memory mapped file reads more sectors speculatively after a page access, but I don't know if that is the case, it might be as well be less. (since normal File I/O is more likely to be linear) So in summary, I think _maybe_ reading the whole file always might win a bit in filereading performance. I don't expect memory mapping to do so. The whole file hypothesis could be easily testable (if applies at all) by increasing the buffersize. But if I understand finput.pas properly, FPC already uses a 64k buffersize (which is larger than most sourcefiles), so I don't expect much gain here. And, worse, I think that even if that results in a gain is dwarfed by directory operations (searching files, creating new files) and binary startup time. (of compiler but also other tools). (*) empirical time for a core2 to move a large block. (source+dest cache) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? AFAIK File Mapping is used a lot and very successfully with Linux, but it _is_ available with NTFS. I've heard it the opposite way, that it has become available for *certain* Linux distros as well ;-) And Delphi (Windows!) users reported noticeable performance boosts (factor 3+), even if nobody ever came up with non-trivial example code, including fallbacks for restricted (32 bit) address space. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Marco van de Voort schrieb: In our previous episode, Michael Schnell said: On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote: One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? AFAIK File Mapping is used a lot and very successfully with Linux, but it _is_ available with NTFS. No idea if, here, the implementation is done in a way this it's really fast. I've tried it long ago in win2000, and maybe even XP. If you linearly access the files with a large enough blocksize (8 or 16kb), it was hardly measurable. (+/- 300MB files). Probably if you go linearly, the readahead is already near efficient. Windows offers certain file attributes for that purpose, that notify the OS of intended (strictly) sequential file reads - what would allow to read-ahead more file content into the system cache. Mapping does not change that picture (the head still has to move if you access a previously unread block). Mapping mainly is more about - zero-copy access to file content - and uses the VM system to cache _already accessed_ blocks. - and backs up RAM pages by the original file, they never will end up in the swap file. The whole file hypothesis could be easily testable (if applies at all) by increasing the buffersize. But if I understand finput.pas properly, FPC already uses a 64k buffersize (which is larger than most sourcefiles), so I don't expect much gain here. I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. When memory management and (string) copies really are as expensive as some people say, then these *additional* optimizations should give the really achievable speed gain. IMO we should give these additional optimziations an try, independent from the use of MMF. When an entire source file is loaded into memory, we can measure the time between reading the first token and hitting EOF in the parser, eliminating all uncertain MMF/file cache timing. It's only a matter of the acceptance of such a refactored model, since it's a waste of time when it never will become part of the trunk, for already known reasons. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: On 07/13/2010 11:18 PM, Hans-Peter Diettrich wrote: When we rely on an OS file chache, we can read all files entirely into memory, instead of using buffered I/O. Loading the complete file instead of parts of it would do unnecessary memory copies. How that? Of course the entire file uses more address space than a smaller buffer, but when the file is parsed, the same number of bytes must be copied to local memory in either case. And when the entire file sits in memory, the scanner and parser operations can be optimized for much higher speed, by e.g. removing unnecessary address calculations, bounds checks and string copies. In fact I suppose using file mapping instead of read (and maybe write) should improve speed in many cases. Therefore my question about a platform independent solution for MMF. At least we could implement a MMF (source) file class, that emulates this feature on platforms without MMF support. Support also could be restricted to map only entire files, for the compiler - otherwise the management of mapping windows would degrade the achievable performance. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: Probably if you go linearly, the readahead is already near efficient. Windows offers certain file attributes for that purpose, that notify the OS of intended (strictly) sequential file reads - what would allow to read-ahead more file content into the system cache. I can vaguely remember something like that too. It is a matter of hacking that into the RTL, and then measure make cycle (requires a few reboots to preclude caching) Mapping does not change that picture (the head still has to move if you access a previously unread block). Mapping mainly is more about - zero-copy access to file content - and uses the VM system to cache _already accessed_ blocks. - and backs up RAM pages by the original file, they never will end up in the swap file. If swapping enters the picture, then all these savings are peanuts, so we assume that is absent. The whole file hypothesis could be easily testable (if applies at all) by increasing the buffersize. But if I understand finput.pas properly, FPC already uses a 64k buffersize (which is larger than most sourcefiles), so I don't expect much gain here. I see the biggest benefit in many possible optimization in the scanner and parser, which can be implemented *only if* an entire file resides in memory. When memory management and (string) copies really are as expensive as some people say, then these *additional* optimizations should give the really achievable speed gain. That's easily said, but often when you enter the details, you have to often make compromises. And sacrifice speed. IMO we should give these additional optimziations an try, independent from the use of MMF. When an entire source file is loaded into memory, we can measure the time between reading the first token and hitting EOF in the parser, eliminating all uncertain MMF/file cache timing. It's only a matter of the acceptance of such a refactored model, since it's a waste of time when it never will become part of the trunk, for already known reasons. I don't think we ever going to give an up front carte blanche for a massive rewrite to go into trunk. That is simply not sane. A subsmission will always be judged on performance and maintainability before being admitted. If this bothers you, try to find smart ways to phase the changes, and limit yourself to a few things at a time, and don't try to speedoptimize I/O, change parser, allow multiple frontends etc, all at the same time. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: And Delphi (Windows!) users reported noticeable performance boosts (factor 3+), even if nobody ever came up with non-trivial example code, including fallbacks for restricted (32 bit) address space. Yeah, and no wonder, most probably benchmarked against plain textfile I/O with its default 128 byte buffer. One can actually spice FPC/Delphi text I/O up quite nicely with settextbuf to 8k (the last time I tested, in P4 3 GHz times, higher values didn't really matter anymore) The compiler however already has its own 64k buffering system. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Marco van de Voort schrieb: I don't think we ever going to give an up front carte blanche for a massive rewrite to go into trunk. That is simply not sane. ACK. I'm more concerned about work that is blacklisted for some reason. A subsmission will always be judged on performance and maintainability before being admitted. If this bothers you, try to find smart ways to phase the changes, and limit yourself to a few things at a time, and don't try to speedoptimize I/O, change parser, allow multiple frontends etc, all at the same time. Just this is hard to obey, when I see so many details that could be improved. Will it do harm when I create more than one branch, e.g. one for general optimizations? Can other people contribute to such an branch as well? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: I don't think we ever going to give an up front carte blanche for a massive rewrite to go into trunk. That is simply not sane. ACK. I'm more concerned about work that is blacklisted for some reason. One reason the more to phase and modularize your efforts. It is less all or nothing. As far as the blacklisting goes. There is only one way to counter skepticism. Show the goods, and it better be good. Core is not unreasonable (*), but it will take more than simply pointing at some totally out of sync, totally overhauled branch, and saying done. Free Pascal is not a one way show, and that means cooperation and communication. People's opionions differ. A subsmission will always be judged on performance and maintainability before being admitted. If this bothers you, try to find smart ways to phase the changes, and limit yourself to a few things at a time, and don't try to speedoptimize I/O, change parser, allow multiple frontends etc, all at the same time. Just this is hard to obey, when I see so many details that could be improved. Will it do harm when I create more than one branch, e.g. one for general optimizations? Can other people contribute to such an branch as well? Keep in mind that running many branches long term will only increase the amount of management to keep them in sync, makes it more difficult to merge the finished results back etc. Focus your efforts, on as small phases as possible. And don't ever consider other people helping you in your planning, since it will nearly always be less than expected. (*) well except me obviously, but I won't be reviewing compiler submissions, so it is easier for me to say this all. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/13/2010 01:46 AM, Hans-Peter Diettrich wrote: That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, I doubt this. The disk-cache does a decent work here. gcc can do this very effectively on a higher layer, as for each source file gcc is called separately by make. As FPC internally organizes the unit make sequence, I suppose internal multithreading needs to be implemented. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/12/2010 05:54 PM, Hans-Peter Diettrich wrote: M68K machine, which in turn seems to have inherited from the ARM. I suppose: vice versa :). .., but it doesn't allow to support multiple machine back-ends in one program. Do you think it would be an advantage to support multiple archs in a single compiler executable ? I feel that recompiling the compiler when changing the target CPU is not very harmful. I could not find much, and most existing documentation is outdated since 2.0 :-( Of course improvement on that issue would be very desirable :). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: For me, a much higher priority when doing rewrites might be multithreading nf the compiler itself. That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, that can not be widened by parallel processing. Memory throughput is a bottleneck, I/O not really. So multithreading has a real advantage on NUMA systems and systems where different cores have dedicated caches. One or two years ago, I did some experiments with asynchronous assembler calls and it already improved significantly compilation times on platforms using an external assembler. The problem is that the whole compiler is not designed to do so. This could be solved by an approach we want to implement for years: split the compilation process into tasks (like parse unit X, load unit Y, code gen unit X) with dependencies. This should also solve the fundamental problems with unit loading/compilation causing sometimes internal errors. The first step would be to do this without multithreading, later it could be tried to execute several tasks in parallel. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: For me, a much higher priority when doing rewrites might be multithreading nf the compiler itself. That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, that can not be widened by parallel processing. No that has to be solved by a bigger granularity (compiling more units in one go). That avoids ppu reloading and limits directory searching (there is a cache iirc) freeing up more bandwidth for source loading. Not only compiling goes in paralel, I assume one could also load a ppu in parallel? (and so parallelize the blocking time of the disk I/O and the parsing of the .ppu contents. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: For me, a much higher priority when doing rewrites might be multithreading nf the compiler itself. That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, that can not be widened by parallel processing. No that has to be solved by a bigger granularity (compiling more units in one go). That avoids ppu reloading and limits directory searching (there is a cache iirc) freeing up more bandwidth for source loading. Not only compiling goes in paralel, I assume one could also load a ppu in parallel? With compiling I meant all tasks the compiler does, even assemling and linking. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, I doubt this. The disk-cache does a decent work here. gcc can do this very effectively on a higher layer, as for each source file gcc is called separately by make. As FPC internally organizes the unit make sequence, I suppose internal multithreading needs to be implemented. A C compiler has to access the very same header files over and over again, so that a file cache can reduce disk I/O considerably. But when FPC processes every source unit in a project only once, the file cache is not very helpful. Nontheless it may make sense to process the units in threads, so that an already read unit can be processed while other threads still are waiting for disk I/O. I only doubt that this will result in a noticeable overall speed gain, when the results have to be written back to disk after compilation. But we can know more only after according tests... DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/13/2010 02:49 PM, Hans-Peter Diettrich wrote: But when FPC processes every source unit in a project only once, the file cache is not very helpful. Obviously, a sufficiently huge cache can avoid any disk I/O bottleneck when doing the 2nd+ build. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: M68K machine, which in turn seems to have inherited from the ARM. I suppose: vice versa :). At least I found files with comments from/for ARM. .., but it doesn't allow to support multiple machine back-ends in one program. Do you think it would be an advantage to support multiple archs in a single compiler executable ? I feel that recompiling the compiler when changing the target CPU is not very harmful. I don't understand the current compilation process yet. How is the target command line switch handled? Does pp spawn the target-specific compiler? I could not find much, and most existing documentation is outdated since 2.0 :-( Of course improvement on that issue would be very desirable :). What format should it be? Wiki entries were easily extensible, but it's also easy to loose the overview over the missing pieces. FPDoc is nasty to format, though it would allow to inline the documentation with the online help. I'd prefer HTML, or OpenOffice if it allows for embedded links. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl schrieb: Memory throughput is a bottleneck, I/O not really. So multithreading has a real advantage on NUMA systems and systems where different cores have dedicated caches. One or two years ago, I did some experiments with asynchronous assembler calls and it already improved significantly compilation times on platforms using an external assembler. Good to know :-) The problem is that the whole compiler is not designed to do so. This could be solved by an approach we want to implement for years: split the compilation process into tasks (like parse unit X, load unit Y, code gen unit X) with dependencies. This should also solve the fundamental problems with unit loading/compilation causing sometimes internal errors. The first step would be to do this without multithreading, later it could be tried to execute several tasks in parallel. I should know more about available threading features (blocking, synchronization...). IMO compilation should be done in two steps, with the first step providing the interface for used units, from a .ppu file or by a new parse. Once this information is available, the using units (threads) can resume their work. The final code generation can occur in further threads. At least I know now what to look for, in my parser redesign. It seems to be a good idea to reduce the number of global links, so that in a following compiler redesign multiple threads can do their work independently. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
In our previous episode, Hans-Peter Diettrich said: No that has to be solved by a bigger granularity (compiling more units in one go). That avoids ppu reloading and limits directory searching (there is a cache iirc) freeing up more bandwidth for source loading. ACK. The compiler should process in one go as many units as possible - but this is more a matter of the framework (Make, Lazarus...), that should pass complete lists of units to the compiler (projects, packages). Not necessarily. One could also strengthen the make capabilities of the compiler, think about reworking the compiler to be kept resident etc. As a workaround a dedicated server process could hold the least recently processed unit objects in RAM, for use in immediately following compilation of other units. But this would only cure the symptoms, not the reason for slow compiles :-( (some random wild thinking:) Jonas seems to indicate most is due to the object model (zeroing) and memorymanagement in general. One must keep in mind though that he probably measures on a *nix, and there is a reason why on Windows the make cycle takes twice the time on Windows. I don't think under Windows, the CPU or the cache halves in speed, so it must be more in the I/O sphere: - ntfs is relatively slow in directory operations (seeking) - Windows is slow starting up binaries. - Afaik ntfs caching is optimized for fileserver use, not to speed up a single application strongly. Specially if that apps starts/stops constantly (a model that is foreign on Windows) So one can't entirely rule out limiting I/O and number of compiler startups, since not all OSes are alike. For the memory management issues, an memory manager specifically for the compiler is the solution first hand. To make it worthwhile to have a list of zeroed blocks (and have a thread zero big blocks), somehow the system must know when a zeroed block is needed. For objects this maybe could be by creating a new root object, and deriving every object from it (cclasses etc). But that would still leave dynamic arrays and manually allocated memory. For manually allocated memory of always the same size (virtual register map?) a pooling solution could be found. It may be a good idea to implement different models, that either read entire files or use the current (buffered) access. Depending on disk fragmentation it may be faster to read entire (unfragmented) source or ppu files, before requests for other files can cause disk seeks and slow down continued reading of files from other places. Both models can be used concurrently, when an arbitration is possible from certain system (load) parameters. Most OSes already read several 10s of kbs in advance. I don't really think that will bring that much. Such approaches are so lowlevel that the OS could do it, and probably it will. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: On 07/13/2010 02:49 PM, Hans-Peter Diettrich wrote: But when FPC processes every source unit in a project only once, the file cache is not very helpful. Obviously, a sufficiently huge cache can avoid any disk I/O bottleneck when doing the 2nd+ build. Then the system file cache will make it hard to determine reasonable figures for the first build. And I wonder how often long builds are run more often in sequence? When we rely on an OS file chache, we can read all files entirely into memory, instead of using buffered I/O. Or we can design an interface that allows to run the compiler e.g. inside Lazarus, using the already loaded editor files and directory caches. BTW, should we switch the thread topic? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Marco van de Voort schrieb: One must keep in mind though that he probably measures on a *nix, and there is a reason why on Windows the make cycle takes twice the time on Windows. One of these issues are memory mapped files, that can speed up file access a lot (I've been told), perhaps because it maps directly to the system file cache? So one can't entirely rule out limiting I/O and number of compiler startups, since not all OSes are alike. That means optimizing for one platform may slow down the compiler on other platforms :-( For the memory management issues, an memory manager specifically for the compiler is the solution first hand. To make it worthwhile to have a list of zeroed blocks (and have a thread zero big blocks), somehow the system must know when a zeroed block is needed. For objects this maybe could be by creating a new root object, and deriving every object from it (cclasses etc). But that would still leave dynamic arrays and manually allocated memory. When zeroing blocks really is an issue, then I suspect that it's more an issue of memory chaches. This would mean that the data locality should be increased, i.e. related pieces of data should reside physically next each other (same page). Most list implementations (TList) tend to spread the list and its entries across the address space. Special considerations may apply to 64 bit systems, with an (currently) almost unlimited address space. There it might be a good idea to allocate lists bigger than really needed, what should do no harm when the unused elements never are allocated to RAM (thanks to paged memory management). Then a TList with buckets only is slower on such a system, for no other gain. For manually allocated memory of always the same size (virtual register map?) a pooling solution could be found. Again candidates for huge pre-allocated memory arrays. But when these elements then are not used together, they may occupy one or two memory pages, and the remaining RAM in these pages is unused. Most OSes already read several 10s of kbs in advance. I don't really think that will bring that much. Such approaches are so lowlevel that the OS could do it, and probably it will. Every OS with MMF will do so, when memory mapped files only are used. The rest IMO is so platform specific, that a single optimization strategy may not be a good solution for other platforms. But I think that such low-level considerations should be left for later, when the big issues are fixed, and the requirements for exploring the real behaviour of various strategies have been implemented. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/10/2010 12:40 PM, Hans-Peter Diettrich wrote: Let me know if you (or somebody else) has more concrete plans on the integration of a new CPU. I remember some discussions about doing a MIPS / PIC32 port recently I just stripped down the machine files for a no_cpu machine (all fakes), with some documentation about the required units etc. Is this based on what we already have for X86, ARM, etc, or does it fork to another set of ARC implementations ? If fork, is it intended / viable to move the existing implementations into that scheme ? An implementation of a new CPU, based on that skeleton, would rise the priority for further explorations and documentation. No idea in what state the structure / documentation of the existing fully supported implementations such as x86 and ARM is. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: I just stripped down the machine files for a no_cpu machine (all fakes), with some documentation about the required units etc. Is this based on what we already have for X86, ARM, etc, or does it fork to another set of ARC implementations ? If fork, is it intended / viable to move the existing implementations into that scheme ? The no_cpu skeleton was stripped down from the M68K machine, which in turn seems to have inherited from the ARM. Due to hard coded dependencies it was impossible to remove e.g. registers completely, and also a $define of some already known machine must be given, else every compilation will fail immediately with an $fatal error. That skeleton reflects the units, data structures and procedures, that are referenced by other parts of the compiler (hard coded). Every machine consists of a formal description (registers, instructions...), node generators for the parse tree, code (tree) optimizers, assembler and output generators for binary code and debug info. A distinct machine back-end is selected by adding its source folder to the unit search path. This may be the fastest possible implementation for one (of multiple) machines, but it doesn't allow to support multiple machine back-ends in one program. The same applies to the front-ends, which currently are not exchangable at all. More flexibility would require a plug-in scheme or similar, hard to do without dynamically loadable packages. But since some abstract links already exist (class type variables for machine specific descendants), these links could be exchanged at runtime, not only in the initialization section of the machine specific units. Then it would be sufficient to add all (wanted) front- or back-ends to the compiler, and switch amongst these at runtime. Where switching the target machine at runtime does not make much sense to me, in contrast to switching front-ends based on the source file types. An implementation of a new CPU, based on that skeleton, would rise the priority for further explorations and documentation. No idea in what state the structure / documentation of the existing fully supported implementations such as x86 and ARM is. I could not find much, and most existing documentation is outdated since 2.0 :-( Some parts, like the parse tree nodes, are somewhat self-explaining. The formal machine descriptions (registers, options...) are almost undocumented. I tried to make the construction of the register descriptor constants more transparent, by composing them from other sets of constants. There seem to exist tools that produce e.g. register descriptors (in include files), but I did not yet dig into the tools folder. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: But since some abstract links already exist (class type variables for machine specific descendants), these links could be exchanged at runtime, One problem are all the used constants describing the target architecture. We discussed multiple back-ends in one compiler already in 2002 and saw no advantage in it so we didn't try to solve it and we decided to use the fpc -P ... solution which makes no difference for the user. For me, a much higher priority when doing rewrites might be multithreading nf the compiler itself. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl schrieb: Hans-Peter Diettrich schrieb: But since some abstract links already exist (class type variables for machine specific descendants), these links could be exchanged at runtime, One problem are all the used constants describing the target architecture. We discussed multiple back-ends in one compiler already in 2002 and saw no advantage in it so we didn't try to solve it and we decided to use the fpc -P ... solution which makes no difference for the user. Full ACK. For me, a much higher priority when doing rewrites might be multithreading nf the compiler itself. That's questionable, depending on the real bottlenecks in compiler operation. I suspect that disk I/O is the narrowest bottleneck, that can not be widened by parallel processing. It also requires further research, for e.g. the determination of the optimal number of threads, depending on the currently available resources on a concrete machine. But of course it's worth a try, to find out more... DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: In fact I did a (quite low priority) research on how to port FPC to a new CPU such as NIOS and Blackfin and found that it of course is doable somehow. While NIOS seems to look more doable, as it's quite similar to MIPS (and ARM), Blackfin has a much more complex instruction set with a huge potential for low-level optimization. Thus I suppose Blackfin is quite hard to do. Let me know if you (or somebody else) has more concrete plans on the integration of a new CPU. I just stripped down the machine files for a no_cpu machine (all fakes), with some documentation about the required units etc. An implementation of a new CPU, based on that skeleton, would rise the priority for further explorations and documentation. I already have more suggestions for easier implementation of new machines... DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Jeppe Johansen schrieb: I would be interested in knowing whether it would be feasible to create DSP backends for FPC. I recently had the experience of using a TMS320C26 which probably has to be programmed in assembler due to the limits of the instruction set. But I hear newer DSPs use instruction sets geared alot more towards highlevel compilers This IMO would require new language elements, for parallel/vector operations, dedicated libraries, or *very* clever optimizers. A general-purpose language like Pascal is not suited for coding DSP operations. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hans-Peter Diettrich schrieb: Jeppe Johansen schrieb: I would be interested in knowing whether it would be feasible to create DSP backends for FPC. I recently had the experience of using a TMS320C26 which probably has to be programmed in assembler due to the limits of the instruction set. But I hear newer DSPs use instruction sets geared alot more towards highlevel compilers This IMO would require new language elements, for parallel/vector operations, dedicated libraries, or *very* clever optimizers. A general-purpose language like Pascal is not suited for coding DSP operations. FPC has already basic MMX support by allowing operations on appropriate arrays so DSP support could be added based on such approach or just using intrinsics. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/09/2010 01:22 PM, ik wrote: Does FPC capable of supporting blackfin http://www.analog.com/en/embedded-processing-dsp/processors/index.html (and others in it's family) CPU ? It seems that more and more embedded projects starting to use it. No. In fact I did a (quite low priority) research on how to port FPC to a new CPU such as NIOS and Blackfin and found that it of course is doable somehow. While NIOS seems to look more doable, as it's quite similar to MIPS (and ARM), Blackfin has a much more complex instruction set with a huge potential for low-level optimization. Thus I suppose Blackfin is quite hard to do. OTOH I have the impression that the real winner with embedded projects will be ARM (especially Cortex) and here Linux is getting interesting for even lower-size and higher-Volume projects. So I am shifting my interests towards Linux enabled embedded ARM chips (like the TI AM1x and AM3x Sitara series that has been introduced in 2010 and feature a RISK coprocessor for hard realtime / virtual peripheral stuff). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Hi Michael I too am planning to switch to 'sitara' AM3517 SBCs from at91sam9263. Hope there would be an effort to an fpc port to this board on linux-uclibc. regards Nataraj On Fri, Jul 9, 2010 at 5:19 PM, Michael Schnell mschn...@lumino.de wrote: On 07/09/2010 01:22 PM, ik wrote: Does FPC capable of supporting blackfinhttp://www.analog.com/en/embedded-processing-dsp/processors/index.html(and others in it's family) CPU ? It seems that more and more embedded projects starting to use it. No. In fact I did a (quite low priority) research on how to port FPC to a new CPU such as NIOS and Blackfin and found that it of course is doable somehow. While NIOS seems to look more doable, as it's quite similar to MIPS (and ARM), Blackfin has a much more complex instruction set with a huge potential for low-level optimization. Thus I suppose Blackfin is quite hard to do. OTOH I have the impression that the real winner with embedded projects will be ARM (especially Cortex) and here Linux is getting interesting for even lower-size and higher-Volume projects. So I am shifting my interests towards Linux enabled embedded ARM chips (like the TI AM1x and AM3x Sitara series that has been introduced in 2010 and feature a RISK coprocessor for hard realtime / virtual peripheral stuff). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/09/2010 01:55 PM, Nataraj S Narayan wrote: Hi Michael I too am planning to switch to 'sitara' AM3517 SBCs from at91sam9263. Hope there would be an effort to an fpc port to this board on linux-uclibc. I understand that the Cortex A8 which powers the AM3x features the full 32 bit ARM instruction set plus the 16 Bit Thumb instruction set plus several enhancements. Thus code created by FPC/ARM should just run out of the box. In the Lazarus mailing list, we lately discussed how to special do stuff like atomic instructions and Futex without libc binding. Here ARM-Linux offers a shared userland page with functions that are always provided in a version optimized for the CPU subarch we are running on. Using same in the RTL instead of doing our own (non sub-arch optimized at runtime) code would be a nice enhancement for the RTL -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
I would be interested in knowing whether it would be feasible to create DSP backends for FPC. I recently had the experience of using a TMS320C26 which probably has to be programmed in assembler due to the limits of the instruction set. But I hear newer DSPs use instruction sets geared alot more towards highlevel compilers Michael: Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL already have implementations for ARMv6 instructions(ldrex/strex) which is pretty much what the architecture manual wrote as example code Michael Schnell skrev: On 07/09/2010 01:55 PM, Nataraj S Narayan wrote: Hi Michael I too am planning to switch to 'sitara' AM3517 SBCs from at91sam9263. Hope there would be an effort to an fpc port to this board on linux-uclibc. I understand that the Cortex A8 which powers the AM3x features the full 32 bit ARM instruction set plus the 16 Bit Thumb instruction set plus several enhancements. Thus code created by FPC/ARM should just run out of the box. In the Lazarus mailing list, we lately discussed how to special do stuff like atomic instructions and Futex without libc binding. Here ARM-Linux offers a shared userland page with functions that are always provided in a version optimized for the CPU subarch we are running on. Using same in the RTL instead of doing our own (non sub-arch optimized at runtime) code would be a nice enhancement for the RTL -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/09/2010 02:36 PM, Jeppe Johansen wrote: Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL already have implementations for ARMv6 instructions(ldrex/strex) which is pretty much what the architecture manual wrote as example code I do know this. But as at compile time the compiler (person) can't know on what sub-arch the user (person) will run the program he can't tell the compiler (program) for which sub-arch to create the binary. So he likely will use the default setting which is ARMv5 and does not use the modern atomic-supporting instructions. That is why the Linux Kernel community provides us with this common page providing interlocked (atomic in Linux-language) userland Functions automatically optimized appropriately. It would be a real shame not to take advantage of this in a Linux environment. I feel using these function is really easy and will not even need ASM (just calling a function at a fixed address). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
BTW.: The v5 workaround code not only is less efficient (doing busy splinocks), it also creates potential deadlocks in certain cases (realtime priority), is not 100% secure and it might be that certain chips even fail handling the swap instruction atomicly and thus the code will not work at all. Thus, IMHO, basing the userland part of Futex on this is not recommended. The common Linux functions are guaranteed to work decently (with Kernel support if necessary) and always are as fast as possible. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Jeppe Johansen schrieb: I would be interested in knowing whether it would be feasible to create DSP backends for FPC. As usual: a matter of time. I recently had the experience of using a TMS320C26 which probably has to be programmed in assembler due to the limits of the instruction set. But I hear newer DSPs use instruction sets geared alot more towards highlevel compilers Michael: Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL already have implementations for ARMv6 instructions(ldrex/strex) which is pretty much what the architecture manual wrote as example code Yes, but the code is not selected dynamically: if one compiles for armv5 but runs it on a armv6+, the armv6 doesn't use the new instructions. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Michael Schnell schrieb: BTW.: The v5 workaround code not only is less efficient (doing busy splinocks), it also creates potential deadlocks in certain cases (realtime priority), is not 100% secure and it might be that certain chips even fail handling the swap instruction atomicly and thus the code will not work at all. Thus, IMHO, basing the userland part of Futex on this is not recommended. The common Linux functions are guaranteed to work decently (with Kernel support if necessary) and always are as fast as possible. Maybe you should write a patch before you have to repeat this for the next ten years. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/09/2010 03:44 PM, Florian Klaempfl wrote: Maybe you should write a patch before you have to repeat this for the next ten years. As already said in the other thread I'll do this as soon as I will have the appropriate equipment to test it. Until that point of time I only can do research and hope for others to do the implementation. If anybody thinks I can be of any further help please let me know. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl skrev: I recently had the experience of using a TMS320C26 which probably has to be programmed in assembler due to the limits of the instruction set. But I hear newer DSPs use instruction sets geared alot more towards highlevel compilers Michael: Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL already have implementations for ARMv6 instructions(ldrex/strex) which is pretty much what the architecture manual wrote as example code Yes, but the code is not selected dynamically: if one compiles for armv5 but runs it on a armv6+, the armv6 doesn't use the new instructions. True, but do you think anyone does that? :) Most people know what end hardware their programs will run on I don't think we can have support for both in the rtl. I don't even think you can do that, since GNU as won't accept ARMv6 instructions if you assemble for ARMv5, and throw an error ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Jeppe Johansen schrieb: Yes, but the code is not selected dynamically: if one compiles for armv5 but runs it on a armv6+, the armv6 doesn't use the new instructions. True, but do you think anyone does that? :) Most people know what end hardware their programs will run on I don't think we can have support for both in the rtl. I don't even think you can do that, since GNU as won't accept ARMv6 instructions if you assemble for ARMv5, and throw an error Well, you can always assemble for ARMv6 but use only ARMv5 instructions when running on ARMv5. This is what the rtl does with the PLD instruction in system.move: a system.move procedure with PLD is only used when the CPU supports it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl schrieb: Jeppe Johansen schrieb: Yes, but the code is not selected dynamically: if one compiles for armv5 but runs it on a armv6+, the armv6 doesn't use the new instructions. True, but do you think anyone does that? :) Most people know what end hardware their programs will run on I don't think we can have support for both in the rtl. I don't even think you can do that, since GNU as won't accept ARMv6 instructions if you assemble for ARMv5, and throw an error Well, you can always assemble for ARMv6 but use only ARMv5 instructions when running on ARMv5. This is what the rtl does with the PLD instruction in system.move: a system.move procedure with PLD is only used when the CPU supports it. Of course, this aims more towards generic arm-linux programs than embedded stuff where you usually build your custom rtl anyways. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
On 07/09/2010 03:52 PM, Jeppe Johansen wrote: True, but do you think anyone does that? :) Most people know what end hardware their programs will run on in fact to work decently, the v5 version needs Kernel support (if an interrupt is issued while doing the atomic stuff) that only is possible by using the Linux provided functions (only available if you really compile for Linux, of course). The v6+ code is a pure userland thingy and thus the RTL implementation is fine. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Blackfin support
Florian Klaempfl wrote on Fri, 09 Jul 2010: Florian Klaempfl schrieb: Well, you can always assemble for ARMv6 but use only ARMv5 instructions when running on ARMv5. This is what the rtl does with the PLD instruction in system.move: a system.move procedure with PLD is only used when the CPU supports it. Of course, this aims more towards generic arm-linux programs than embedded stuff where you usually build your custom rtl anyways. At least if you use the VFP, you have to compile all code for the correct cpu, because the compiler has to use different versions of the load/store multiple vfp registers instructions for pre-ARMv6 and for ARMv6 and later in the function prologs. Jonas This message was sent using IMP, the Internet Messaging Program. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
RE: [fpc-devel] BlackFin
Michael Schnell schrieb: Florian, Thanks a lot for discussing this ! Big thanks from me too! Coding the compiler part isn't that hard. I can do this, I did the initial arm port within a few weeks. The more annoying part is doing the debugging and find the things being broken. We have to start a research hardware project end of may, and are also in the middle between choosing an ARM/FPC way or a blackfin/non fpc way. This discussions opens a new possibility, which I would gratly favour. We could help to debug a blackfin port, and i would also donate some money for a development board, if that helps. Helmut -- Helmut dot hartl at firmos.at ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
RE: [fpc-devel] BlackFin
Big thanks from me too! Coding the compiler part isn't that hard. I can do this, I did the initial arm port within a few weeks. The more annoying part is doing the debugging and find the things being broken. We have to start a research hardware project end of may, and are also in the middle between choosing an ARM/FPC way or a blackfin/non fpc way. This discussions opens a new possibility, which I would gratly favour. We could help to debug a blackfin port, and i would also donate some money for a development board, if that helps. Helmut Sorry, i forgot to mention a valuable source of information: http://www.bluetechnix.at/ -- Helmut dot hartl at firmos.at ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
RE: [fpc-devel] BlackFin
Hi! Sorry, i forgot to mention a valuable source of information: http://www.bluetechnix.at/ Coincidental I'm (partly) working for this company and know their staff quite well. If you need any assistance or similar I'd be happy to help out. Bye Hansi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
We have to start a research hardware project end of may, and are also in the middle between choosing an ARM/FPC way or a blackfin/non fpc way. This discussions opens a new possibility, which I would gratly favour. Great to know that I am not the only one ! :-) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
If you're interested, start a wiki page with some information where to get docs, tools, info about calling conventions etc. Thanks for the encouragement ! Next month I'll see an FAE who just started supporting the BlackFin line. After that I hope I can see a bit clearer. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
Florian, Thanks a lot for discussing this ! Well, this depends how good one wants to make such a port ... In this case the (first version of the) port does not need to be very good (in the sense of creating optimized code), but of course the compiler needs to produce correct code that performs what had been expressed in Pascal. Do you think, it could be doable to create a not very optimized compiler ? I would join the team, but as I'm new to compiler construction, I don't think I can start that project by myself. The beauty of the Blackfin is that is extremely fast and offers an excellent price/performance relation. The chip I intend to use comes with two CPUs each clocked with 600 MHz. As the Blackfin can do single instruction / multiple data (e.g. four 8 bit adds or two 16 bit multiply/adds in a cycle per CPU) and provides a zero-overhead looping mechanism the performance per CPU is (depending on the Application) like a 800 to 1500 MHz ARM. And the dual core chip costs about $20. Of course there are several smaller Blackfin chips. So IMHO this is an excellent processor for embedded use. And thus an interesting target for an FP port. As far as I have seen the BlackFin has two cores: an arm like risc core and a dsp. Not really. It's a DSP (predecessor of the Blackfin was the Shark DSP line) that has been enhanced by features necessary for standard CPUs. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
r2 = r1 + r3, r4 = dm(i0,m1); /* addition and memory access */ Yep. In my answer to Florian I forgot that (other than ARM) the Blackfin can do a calculation and a memory access in a single instruction cycle. That explains the much better performance even with standard (non-DSP-alike) tasks. r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */ I did not know yet that it can do two independent 32 bit calculations and that it can do 32 bit multiplications. Anyway, even if only two 32 additions can be done in one instruction cycle this is a big chance for optimization. A totally different topic is the inherent parallel processing of a DSP. Usually they can utilize several processing units (+, *) and memories within a single cycle (e.g. see above). Instruction ordering and interleaving to utilize parallelism is tedious to do by hand and I think also challenging for a compiler. Maybe a first version could skip the great chances for optimization and just do a single operation per instruction cycle. It should be able to create a working compiler that way. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
Hi! Am Montag, den 16.04.2007, 11:57 +0200 schrieb Michael Schnell: r2 = r1 + r3, r4 = dm(i0,m1); /* addition and memory access */ Yep. In my answer to Florian I forgot that (other than ARM) the Blackfin can do a calculation and a memory access in a single instruction cycle. That explains the much better performance even with standard (non-DSP-alike) tasks. r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */ I did not know yet that it can do two independent 32 bit calculations and that it can do 32 bit multiplications. Anyway, even if only two 32 additions can be done in one instruction cycle this is a big chance for optimization. The above code is based on an example program for some Shark or TigerShark DSP, so its likely that the BlackFin has other processing units. I've written the code just as an example for the algebraic style. You have to carefully study the structure of the CPU (i.e. processing units, busses, registers, address calculation, ...) to know what can be done in parallel. In the example I've looked at there was a line with 4-instructions-in-1-cycle: f10 = f2 * f4, f12 = f10 + f12, f2 = dm(i1,m2), f4 = pm(i8,m8); (ADSP-2106x). In modern CPUs the parallel utilization of busses and processing units is state of the art. The ressource allocation and parallelization is done on the fly during program execution by some smart logic inside the CPU. When a compiler does optimization for a certain CPU it anticipates this and sorts the instructions and registers approprately to gain a few percent more speed. The beauty of DSPs is that its in the hand of the compiler (or assembler coder) to do the full optimization. Bye Hansi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
Michael Schnell schrieb: Florian, Thanks a lot for discussing this ! Well, this depends how good one wants to make such a port ... In this case the (first version of the) port does not need to be very good (in the sense of creating optimized code), but of course the compiler needs to produce correct code that performs what had been expressed in Pascal. Do you think, it could be doable to create a not very optimized compiler ? I would join the team, but as I'm new to compiler construction, I don't think I can start that project by myself. Coding the compiler part isn't that hard. I can do this, I did the initial arm port within a few weeks. The more annoying part is doing the debugging and find the things being broken. If you're interested, start a wiki page with some information where to get docs, tools, info about calling conventions etc. The beauty of the Blackfin is that is extremely fast and offers an excellent price/performance relation. The chip I intend to use comes with two CPUs each clocked with 600 MHz. As the Blackfin can do single instruction / multiple data (e.g. four 8 bit adds or two 16 bit multiply/adds in a cycle per CPU) and provides a zero-overhead looping mechanism the performance per CPU is (depending on the Application) like a 800 to 1500 MHz ARM. And the dual core chip costs about $20. Of course there are several smaller Blackfin chips. So IMHO this is an excellent processor for embedded use. And thus an interesting target for an FP port. As far as I have seen the BlackFin has two cores: an arm like risc core and a dsp. Not really. It's a DSP (predecessor of the Blackfin was the Shark DSP line) that has been enhanced by features necessary for standard CPUs. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
No. Not without pushing a port to it in some way. Thanks. And I suppose doing a port for that processor would be quite a lot of work, regarding the ASM code is very strange if you compare it to the code of 80x86, PPC or ARM. -Michael ___ fpc-devel maillist - [EMAIL PROTECTED] http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
Michael Schnell schrieb: No. Not without pushing a port to it in some way. Thanks. And I suppose doing a port for that processor would be quite a lot of work, Well, this depends how good one wants to make such a port ... regarding the ASM code is very strange if you compare it to the code of 80x86, PPC or ARM. As far as I have seen the BlackFin has two cores: an arm like risc core and a dsp. -Michael ___ fpc-devel maillist - [EMAIL PROTECTED] http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - [EMAIL PROTECTED] http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] BlackFin
Hi! regarding the ASM code is very strange if you compare it to the code of 80x86, PPC or ARM. As far as I have seen the BlackFin has two cores: an arm like risc core and a dsp. The BlackFin and other Analog Devices DSPs have an uncommon assembler syntax. Contrary to well-known mnemonic style like add eax,ecx they use an algebraic syntax like r2 = r1 + r3, r4 = dm(i0,m1); /* addition and memory access */ r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */ Regarding the implementation for a compiler this is no principal problem, because the compiler has an internal representation of what it wants to do, and this can be equally expressed in (=transformed to) mnemonic and algebraic syntax. A totally different topic is the inherent parallel processing of a DSP. Usually they can utilize several processing units (+, *) and memories within a single cycle (e.g. see above). Instruction ordering and interleaving to utilize parallelism is tedious to do by hand and I think also challenging for a compiler. Bye Hansi ___ fpc-devel maillist - [EMAIL PROTECTED] http://lists.freepascal.org/mailman/listinfo/fpc-devel