subject:"RE\: \[fpc\-devel\] BlackFin"

Re: [fpc-devel] Blackfin support

2010-07-20 Thread Joost van der Sluis

On Wed, 2010-07-14 at 19:19 +0200, Marco van de Voort wrote:
 Core is not unreasonable (*), 

 (*) well except me obviously, but I won't be reviewing compiler submissions,
 so it is easier for me to say this all.

Sorry for the useless post, but this is just funny. ;)

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-15 Thread Florian Klaempfl

Hans-Peter Diettrich schrieb:
 I see the biggest benefit in many possible optimization in the scanner
 and parser, which can be implemented *only if* an entire file resides in
 memory. 

Include files, macro expansion and generics make such optimizations hard.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-15 Thread Florian Klaempfl

Hans-Peter Diettrich schrieb:
 Marco van de Voort schrieb:
 
 I don't think we ever going to give an up front carte blanche for a
 massive
 rewrite to go into trunk. That is simply not sane.
 
 ACK. I'm more concerned about work that is blacklisted for some reason.

Rewriting the compiler in C ;)

Discuss the change on fpc-devel first, if you want to improve
performance, generate numbers for at least:
make cycle
and
lazarus building.

 
 A subsmission will always be judged on performance and maintainability
 before being admitted.

 If this bothers you, try to find smart ways to phase the changes, and
 limit
 yourself to a few things at a time, and don't try to speedoptimize
 I/O, change
 parser, allow multiple frontends etc, all at the same time.
 
 Just this is hard to obey, when I see so many details that could be
 improved.
 
 Will it do harm when I create more than one branch, e.g. one for general
 optimizations? 

No, but try to finish one thing and then start the next one. With
increased time, patches might not apply automatically anymore.

 Can other people contribute to such an branch as well?

Yes, just tell me to which branch and what login (I'll try to be faster
this time ;)).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-15 Thread Hans-Peter Diettrich


Florian Klaempfl schrieb:

Hans-Peter Diettrich schrieb:

I see the biggest benefit in many possible optimization in the scanner
and parser, which can be implemented *only if* an entire file resides in
memory. 


Include files, macro expansion and generics make such optimizations hard.


Not necessarily. When all currently used files reside in memory, every 
(recorded) token can contain an pointer (or offset) into the file 
buffer. This may reduce the number of required string copies (not yet 
fully researched).


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-15 Thread Sergei Gorelkin


Hans-Peter Diettrich wrote:


Not necessarily. When all currently used files reside in memory, every 
(recorded) token can contain an pointer (or offset) into the file 
buffer. This may reduce the number of required string copies (not yet 
fully researched).



You normally shouldn't ever need to process every token this way.
Language keywords are encoded with enumeration type. Everything else is put into hashtable, so you 
typically need only as much string copies as there are distinct identifiers in the file. Besides, 
shortstring copies are pretty cheap, compared to AnsiStrings.


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-15 Thread Hans-Peter Diettrich


Sergei Gorelkin schrieb:

Not necessarily. When all currently used files reside in memory, every 
(recorded) token can contain an pointer (or offset) into the file 
buffer. This may reduce the number of required string copies (not yet 
fully researched).



You normally shouldn't ever need to process every token this way.
Language keywords are encoded with enumeration type. Everything else is 
put into hashtable, so you typically need only as much string copies as 
there are distinct identifiers in the file.


That's okay, in detail when the uniquely cased names have to be stored.

Besides, shortstring copies 
are pretty cheap, compared to AnsiStrings.


I'm not sure about possible string operations, that force implicit 
conversion between Ansi and ShortString. I observed such performance 
hogs in Delphi and other languages, no experience with FPC and the 
concrete compiler code.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
  One must keep in mind though that he probably measures on a *nix, and there
  is a reason why on Windows the make cycle takes twice the time on Windows.
 
 One of these issues are memory mapped files, that can speed up file 
 access a lot (I've been told), perhaps because it maps directly to the 
 system file cache?

As said, I always had the feeling that it was binary startup time, and
directory I/O rather than basic blockwrite/read speed.

Most files are tens of kbs max, and will probably be read entirely anyway by
the system.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Michael Schnell


 On 07/13/2010 11:18 PM, Hans-Peter Diettrich wrote:
When we rely on an OS file chache, we can read all files entirely into 
memory, instead of using buffered I/O.
Loading the complete file instead of parts of it would do unnecessary 
memory copies.


In fact I suppose using file mapping instead of read (and maybe write)  
should improve speed in many cases.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Michael Schnell


 On 07/13/2010 05:19 PM, Hans-Peter Diettrich wrote:
It may be a good idea to implement different models, that either read 
entire files...

read - map
-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Michael Schnell


 On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote:
One of these issues are memory mapped files, that can speed up file 
access a lot (I've been told), perhaps because it maps directly to the 
system file cache?
AFAIK File Mapping is used a lot and very successfully with Linux, but 
it _is_ available with NTFS. No idea if, here, the implementation is 
done in a way this it's really fast.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Marco van de Voort

In our previous episode, Michael Schnell said:
   On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote:
  One of these issues are memory mapped files, that can speed up file 
  access a lot (I've been told), perhaps because it maps directly to the 
  system file cache?
 AFAIK File Mapping is used a lot and very successfully with Linux, but 
 it _is_ available with NTFS. No idea if, here, the implementation is 
 done in a way this it's really fast.

I've tried it long ago in win2000, and maybe even XP. If you linearly access
the files with a large enough blocksize (8 or 16kb), it was hardly
measurable.  (+/- 300MB files).

Probably if you go linearly, the readahead is already near efficient. 

But FPC might not adhere to this scheme, I don't know if FPC currently loads
the whole file or leaves the file open while it processes e.g.  a .inc.

If it doesn't load the whole file, opening others triggers head movement
(if not in cache) that could be avoided.

Mapping does not change that picture (the head still has to move if you
access a previously unread block).  Mapping mainly is more about 
- zero-copy access to file content
- and uses the VM system to cache _already accessed_ blocks.

The compiler does not do enough I/O to make the first worthwhile, the second
is irrelevant to the compiler;s access pattern.

The only way it could matter if the memory mapped file reads more
sectors speculatively after a page access, but I don't know if that is the
case, it might be as well be less. (since normal File I/O is more likely to
be linear)

So in summary, I think _maybe_ reading the whole file always might win a bit
in filereading performance. I don't expect memory mapping to do so. 

The whole file hypothesis could be easily testable (if applies at all) by
increasing the buffersize. But if I understand finput.pas properly, FPC
already uses a 64k buffersize (which is larger than most sourcefiles), so I
don't expect much gain here.

And, worse, I think that even if that results in a  gain is dwarfed by
directory operations (searching files, creating new files) and binary
startup time.  (of compiler but also other tools). 

(*) empirical time for a core2 to move a large block. (source+dest cache)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Hans-Peter Diettrich


Michael Schnell schrieb:

One of these issues are memory mapped files, that can speed up file 
access a lot (I've been told), perhaps because it maps directly to the 
system file cache?
AFAIK File Mapping is used a lot and very successfully with Linux, but 
it _is_ available with NTFS.


I've heard it the opposite way, that it has become available for 
*certain* Linux distros as well ;-)


And Delphi (Windows!) users reported noticeable performance boosts 
(factor 3+), even if nobody ever came up with non-trivial example code, 
including fallbacks for restricted (32 bit) address space.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Hans-Peter Diettrich


Marco van de Voort schrieb:

In our previous episode, Michael Schnell said:

  On 07/14/2010 12:00 AM, Hans-Peter Diettrich wrote:
One of these issues are memory mapped files, that can speed up file 
access a lot (I've been told), perhaps because it maps directly to the 
system file cache?
AFAIK File Mapping is used a lot and very successfully with Linux, but 
it _is_ available with NTFS. No idea if, here, the implementation is 
done in a way this it's really fast.


I've tried it long ago in win2000, and maybe even XP. If you linearly access
the files with a large enough blocksize (8 or 16kb), it was hardly
measurable.  (+/- 300MB files).

Probably if you go linearly, the readahead is already near efficient.


Windows offers certain file attributes for that purpose, that notify the 
OS of intended (strictly) sequential file reads - what would allow to 
read-ahead more file content into the system cache.




Mapping does not change that picture (the head still has to move if you
access a previously unread block).  Mapping mainly is more about 
- zero-copy access to file content

- and uses the VM system to cache _already accessed_ blocks.
- and backs up RAM pages by the original file, they never will end up in 
the swap file.




The whole file hypothesis could be easily testable (if applies at all) by
increasing the buffersize. But if I understand finput.pas properly, FPC
already uses a 64k buffersize (which is larger than most sourcefiles), so I
don't expect much gain here.


I see the biggest benefit in many possible optimization in the scanner 
and parser, which can be implemented *only if* an entire file resides in 
memory. When memory management and (string) copies really are as 
expensive as some people say, then these *additional* optimizations 
should give the really achievable speed gain.


IMO we should give these additional optimziations an try, independent 
from the use of MMF. When an entire source file is loaded into memory, 
we can measure the time between reading the first token and hitting EOF 
in the parser, eliminating all uncertain MMF/file cache timing.


It's only a matter of the acceptance of such a refactored model, since 
it's a waste of time when it never will become part of the trunk, for 
already known reasons.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Hans-Peter Diettrich


Michael Schnell schrieb:

 On 07/13/2010 11:18 PM, Hans-Peter Diettrich wrote:
When we rely on an OS file chache, we can read all files entirely into 
memory, instead of using buffered I/O.
Loading the complete file instead of parts of it would do unnecessary 
memory copies.


How that? Of course the entire file uses more address space than a 
smaller buffer, but when the file is parsed, the same number of bytes 
must be copied to local memory in either case.


And when the entire file sits in memory, the scanner and parser 
operations can be optimized for much higher speed, by e.g. removing 
unnecessary address calculations, bounds checks and string copies.


In fact I suppose using file mapping instead of read (and maybe write)  
should improve speed in many cases.


Therefore my question about a platform independent solution for MMF.

At least we could implement a MMF (source) file class, that emulates 
this feature on platforms without MMF support. Support also could be 
restricted to map only entire files, for the compiler - otherwise the 
management of mapping windows would degrade the achievable performance.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
  
  Probably if you go linearly, the readahead is already near efficient.
 
 Windows offers certain file attributes for that purpose, that notify the 
 OS of intended (strictly) sequential file reads - what would allow to 
 read-ahead more file content into the system cache.

I can vaguely remember something like that too. It is a matter of hacking
that into the RTL, and then measure make cycle (requires a few reboots to
preclude caching)

  Mapping does not change that picture (the head still has to move if you
  access a previously unread block).  Mapping mainly is more about 
  - zero-copy access to file content
  - and uses the VM system to cache _already accessed_ blocks.
 - and backs up RAM pages by the original file, they never will end up in 
 the swap file.

If swapping enters the picture, then all these savings are peanuts, so we
assume that is absent. 

  The whole file hypothesis could be easily testable (if applies at all) by
  increasing the buffersize. But if I understand finput.pas properly, FPC
  already uses a 64k buffersize (which is larger than most sourcefiles), so I
  don't expect much gain here.
 
 I see the biggest benefit in many possible optimization in the scanner 
 and parser, which can be implemented *only if* an entire file resides in 
 memory. When memory management and (string) copies really are as 
 expensive as some people say, then these *additional* optimizations 
 should give the really achievable speed gain.

That's easily said, but often when you enter the details, you have to often
make compromises. And sacrifice speed.
 
 IMO we should give these additional optimziations an try, independent 
 from the use of MMF. When an entire source file is loaded into memory, 
 we can measure the time between reading the first token and hitting EOF 
 in the parser, eliminating all uncertain MMF/file cache timing.
 
 It's only a matter of the acceptance of such a refactored model, since 
 it's a waste of time when it never will become part of the trunk, for 
 already known reasons.

I don't think we ever going to give an up front carte blanche for a massive
rewrite to go into trunk. That is simply not sane.

A subsmission will always be judged on performance and maintainability
before being admitted.

If this bothers you, try to find smart ways to phase the changes, and limit
yourself to a few things at a time, and don't try to speedoptimize I/O, change
parser, allow multiple frontends etc, all at the same time.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
 
 And Delphi (Windows!) users reported noticeable performance boosts 
 (factor 3+), even if nobody ever came up with non-trivial example code, 
 including fallbacks for restricted (32 bit) address space.

Yeah, and no wonder, most probably benchmarked against plain textfile I/O
with its default 128 byte buffer.

One can actually spice FPC/Delphi text I/O up quite nicely with settextbuf
to 8k (the last time I tested, in P4 3 GHz times, higher values didn't
really matter anymore)

The compiler however already has its own 64k buffering system.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Hans-Peter Diettrich


Marco van de Voort schrieb:


I don't think we ever going to give an up front carte blanche for a massive
rewrite to go into trunk. That is simply not sane.


ACK. I'm more concerned about work that is blacklisted for some reason.


A subsmission will always be judged on performance and maintainability
before being admitted.

If this bothers you, try to find smart ways to phase the changes, and limit
yourself to a few things at a time, and don't try to speedoptimize I/O, change
parser, allow multiple frontends etc, all at the same time.


Just this is hard to obey, when I see so many details that could be 
improved.


Will it do harm when I create more than one branch, e.g. one for general 
optimizations? Can other people contribute to such an branch as well?


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-14 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
  I don't think we ever going to give an up front carte blanche for a massive
  rewrite to go into trunk. That is simply not sane.
 
 ACK. I'm more concerned about work that is blacklisted for some reason.

One reason the more to phase and modularize your efforts. It is less all or
nothing.

As far as the blacklisting goes. There is only one way to counter
skepticism. Show the goods, and it better be good. 

Core is not unreasonable (*), but it will take more than simply pointing at some
totally out of sync, totally overhauled branch, and saying done.

Free Pascal is not a one way show, and that means cooperation and
communication. People's opionions differ.
 
  A subsmission will always be judged on performance and maintainability
  before being admitted.
  
  If this bothers you, try to find smart ways to phase the changes, and limit
  yourself to a few things at a time, and don't try to speedoptimize I/O, 
  change
  parser, allow multiple frontends etc, all at the same time.
 
 Just this is hard to obey, when I see so many details that could be 
 improved.

 Will it do harm when I create more than one branch, e.g. one for general 
 optimizations? Can other people contribute to such an branch as well?

Keep in mind that running many branches long term will only increase the
amount of management to keep them in sync, makes it more difficult to merge
the finished results back etc.

Focus your efforts, on as small phases as possible. 

And don't ever consider other people helping you in your planning, since it
will nearly always be less than expected.

(*) well except me obviously, but I won't be reviewing compiler submissions,
so it is easier for me to say this all.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Michael Schnell


 On 07/13/2010 01:46 AM, Hans-Peter Diettrich wrote:

That's questionable, depending on the real bottlenecks in compiler
operation. I suspect that disk I/O is the narrowest bottleneck,
I doubt this. The disk-cache does a decent work here. gcc can do this 
very effectively on a higher layer, as for each source file gcc is 
called separately by make. As FPC internally organizes the unit make 
sequence, I suppose internal multithreading needs to be implemented.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Michael Schnell


 On 07/12/2010 05:54 PM, Hans-Peter Diettrich wrote:

M68K machine, which in turn seems to have inherited from the ARM.

I suppose: vice versa :).

.., but it doesn't allow to support multiple machine back-ends in one
program.
Do you think it would be an advantage to support multiple archs in a 
single compiler executable ? I feel that recompiling the compiler when 
changing the target CPU is not very harmful.

I could not find much, and most existing documentation is outdated
since 2.0 :-(

Of course improvement on that issue would be very desirable :).

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Florian Klaempfl

Hans-Peter Diettrich schrieb:
 For me, a much higher priority when doing rewrites might be
 multithreading nf the compiler itself.
 
 That's questionable, depending on the real bottlenecks in compiler
 operation. I suspect that disk I/O is the narrowest bottleneck, that can
 not be widened by parallel processing.

Memory throughput is a bottleneck, I/O not really. So multithreading has
a real advantage on NUMA systems and systems where different cores have
dedicated caches. One or two years ago, I did some experiments with
asynchronous assembler calls and it already improved significantly
compilation times on platforms using an external assembler. The problem
is that the whole compiler is not designed to do so. This could be
solved by an approach we want to implement for years: split the
compilation process into tasks (like parse unit X, load unit Y, code gen
unit X) with dependencies. This should also solve the fundamental
problems with unit loading/compilation causing sometimes internal
errors. The first step would be to do this without multithreading, later
it could be tried to execute several tasks in parallel.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
  For me, a much higher priority when doing rewrites might be
  multithreading nf the compiler itself.
 
 That's questionable, depending on the real bottlenecks in compiler 
 operation. I suspect that disk I/O is the narrowest bottleneck, that can 
 not be widened by parallel processing.

No that has to be solved by a bigger granularity (compiling more units in
one go).  That avoids ppu reloading and limits directory searching (there is
a cache iirc) freeing up more bandwidth for source loading.

Not only compiling goes in paralel, I assume one could also load a ppu in
parallel? (and so parallelize the blocking time of the disk I/O and the
parsing of the .ppu contents.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Florian Klaempfl

Marco van de Voort schrieb:
 In our previous episode, Hans-Peter Diettrich said:
 For me, a much higher priority when doing rewrites might be
 multithreading nf the compiler itself.
 That's questionable, depending on the real bottlenecks in compiler 
 operation. I suspect that disk I/O is the narrowest bottleneck, that can 
 not be widened by parallel processing.
 
 No that has to be solved by a bigger granularity (compiling more units in
 one go).  That avoids ppu reloading and limits directory searching (there is
 a cache iirc) freeing up more bandwidth for source loading.
 
 Not only compiling goes in paralel, I assume one could also load a ppu in
 parallel? 

With compiling I meant all tasks the compiler does, even assemling and
linking.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Hans-Peter Diettrich


Michael Schnell schrieb:


That's questionable, depending on the real bottlenecks in compiler
operation. I suspect that disk I/O is the narrowest bottleneck,


I doubt this. The disk-cache does a decent work here. gcc can do this 
very effectively on a higher layer, as for each source file gcc is 
called separately by make. As FPC internally organizes the unit make 
sequence, I suppose internal multithreading needs to be implemented.


A C compiler has to access the very same header files over and over 
again, so that a file cache can reduce disk I/O considerably. But when 
FPC processes every source unit in a project only once, the file cache 
is not very helpful.


Nontheless it may make sense to process the units in threads, so that an 
already read unit can be processed while other threads still are waiting 
for disk I/O. I only doubt that this will result in a noticeable overall 
speed gain, when the results have to be written back to disk after 
compilation. But we can know more only after according tests...


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Michael Schnell


 On 07/13/2010 02:49 PM, Hans-Peter Diettrich wrote:
 But when FPC processes every source unit in a project only once, the 
file cache is not very helpful.
Obviously, a sufficiently huge cache can avoid any disk I/O bottleneck 
when doing the 2nd+ build.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Hans-Peter Diettrich


Michael Schnell schrieb:


M68K machine, which in turn seems to have inherited from the ARM.

I suppose: vice versa :).


At least I found files with comments from/for ARM.


.., but it doesn't allow to support multiple machine back-ends in one
program.
Do you think it would be an advantage to support multiple archs in a 
single compiler executable ? I feel that recompiling the compiler when 
changing the target CPU is not very harmful.


I don't understand the current compilation process yet. How is the 
target command line switch handled? Does pp spawn the target-specific 
compiler?



I could not find much, and most existing documentation is outdated
since 2.0 :-(

Of course improvement on that issue would be very desirable :).


What format should it be? Wiki entries were easily extensible, but it's 
also easy to loose the overview over the missing pieces. FPDoc is nasty 
to format, though it would allow to inline the documentation with the 
online help. I'd prefer HTML, or OpenOffice if it allows for embedded links.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Hans-Peter Diettrich


Florian Klaempfl schrieb:


Memory throughput is a bottleneck, I/O not really. So multithreading has
a real advantage on NUMA systems and systems where different cores have
dedicated caches. One or two years ago, I did some experiments with
asynchronous assembler calls and it already improved significantly
compilation times on platforms using an external assembler.


Good to know :-)


The problem
is that the whole compiler is not designed to do so. This could be
solved by an approach we want to implement for years: split the
compilation process into tasks (like parse unit X, load unit Y, code gen
unit X) with dependencies. This should also solve the fundamental
problems with unit loading/compilation causing sometimes internal
errors. The first step would be to do this without multithreading, later
it could be tried to execute several tasks in parallel.


I should know more about available threading features (blocking, 
synchronization...). IMO compilation should be done in two steps, with 
the first step providing the interface for used units, from a .ppu file 
or by a new parse. Once this information is available, the using units 
(threads) can resume their work. The final code generation can occur in 
further threads.


At least I know now what to look for, in my parser redesign. It seems to 
be a good idea to reduce the number of global links, so that in a 
following compiler redesign multiple threads can do their work 
independently.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Marco van de Voort

In our previous episode, Hans-Peter Diettrich said:
  No that has to be solved by a bigger granularity (compiling more units in
  one go).  That avoids ppu reloading and limits directory searching (there is
  a cache iirc) freeing up more bandwidth for source loading.
 
 ACK. The compiler should process in one go as many units as possible - 
 but this is more a matter of the framework (Make, Lazarus...), that 
 should pass complete lists of units to the compiler (projects, packages).

Not necessarily. One could also strengthen the make capabilities of the
compiler, think about reworking the  compiler to be kept resident etc.
 
 As a workaround a dedicated server process could hold the least recently 
 processed unit objects in RAM, for use in immediately following 
 compilation of other units. But this would only cure the symptoms, not 
 the reason for slow compiles :-(

(some random wild thinking:)

Jonas seems to indicate most is due to the object model (zeroing) and
memorymanagement in general.

One must keep in mind though that he probably measures on a *nix, and there
is a reason why on Windows the make cycle takes twice the time on Windows. I
don't think under Windows, the CPU or the cache halves in speed, so it must
be more in the I/O sphere:
- ntfs is relatively slow in directory operations (seeking)
- Windows is slow starting up binaries.
- Afaik ntfs caching is optimized for fileserver use, not to speed up a 
   single application strongly. Specially if that apps starts/stops
   constantly (a model that is foreign on Windows)

So one can't entirely rule out limiting I/O and number of compiler startups,
since not all OSes are alike.

For the memory management issues, an memory manager specifically for the
compiler is the solution first hand. To make it worthwhile to have a list of
zeroed blocks (and have a thread zero big blocks), somehow the system
must know when a zeroed block is needed. For objects this maybe could be by
creating a new root object, and deriving every object from it (cclasses
etc). But that would still leave dynamic arrays and manually allocated
memory.

For manually allocated memory of always the same size (virtual register
map?) a pooling solution could be found.
 
 It may be a good idea to implement different models, that either read 
 entire files or use the current (buffered) access. Depending on disk 
 fragmentation it may be faster to read entire (unfragmented) source or 
 ppu files, before requests for other files can cause disk seeks and slow 
 down continued reading of files from other places. Both models can be 
 used concurrently, when an arbitration is possible from certain system 
 (load) parameters.

Most OSes already read several 10s of kbs in advance. I don't really think
that will bring that much. Such approaches are so lowlevel that the OS could
do it, and probably it will.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Hans-Peter Diettrich


Michael Schnell schrieb:

 On 07/13/2010 02:49 PM, Hans-Peter Diettrich wrote:
 But when FPC processes every source unit in a project only once, the 
file cache is not very helpful.
Obviously, a sufficiently huge cache can avoid any disk I/O bottleneck 
when doing the 2nd+ build.


Then the system file cache will make it hard to determine reasonable 
figures for the first build. And I wonder how often long builds are run 
more often in sequence?


When we rely on an OS file chache, we can read all files entirely into 
memory, instead of using buffered I/O. Or we can design an interface 
that allows to run the compiler e.g. inside Lazarus, using the already 
loaded editor files and directory caches.



BTW, should we switch the thread topic?

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-13 Thread Hans-Peter Diettrich


Marco van de Voort schrieb:


One must keep in mind though that he probably measures on a *nix, and there
is a reason why on Windows the make cycle takes twice the time on Windows.


One of these issues are memory mapped files, that can speed up file 
access a lot (I've been told), perhaps because it maps directly to the 
system file cache?




So one can't entirely rule out limiting I/O and number of compiler startups,
since not all OSes are alike.


That means optimizing for one platform may slow down the compiler on 
other platforms :-(



For the memory management issues, an memory manager specifically for the
compiler is the solution first hand. To make it worthwhile to have a list of
zeroed blocks (and have a thread zero big blocks), somehow the system
must know when a zeroed block is needed. For objects this maybe could be by
creating a new root object, and deriving every object from it (cclasses
etc). But that would still leave dynamic arrays and manually allocated
memory.


When zeroing blocks really is an issue, then I suspect that it's more an 
issue of memory chaches. This would mean that the data locality should 
be increased, i.e. related pieces of data should reside physically next 
each other (same page). Most list implementations (TList) tend to spread 
the list and its entries across the address space.


Special considerations may apply to 64 bit systems, with an (currently) 
almost unlimited address space. There it might be a good idea to 
allocate lists bigger than really needed, what should do no harm when 
the unused elements never are allocated to RAM (thanks to paged memory 
management). Then a TList with buckets only is slower on such a system, 
for no other gain.




For manually allocated memory of always the same size (virtual register
map?) a pooling solution could be found.


Again candidates for huge pre-allocated memory arrays. But when these 
elements then are not used together, they may occupy one or two memory 
pages, and the remaining RAM in these pages is unused.




Most OSes already read several 10s of kbs in advance. I don't really think
that will bring that much. Such approaches are so lowlevel that the OS could
do it, and probably it will.


Every OS with MMF will do so, when memory mapped files only are used. 
The rest IMO is so platform specific, that a single optimization 
strategy may not be a good solution for other platforms.



But I think that such low-level considerations should be left for later, 
when the big issues are fixed, and the requirements for exploring the 
real behaviour of various strategies have been implemented.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-12 Thread Michael Schnell


 On 07/10/2010 12:40 PM, Hans-Peter Diettrich wrote:


Let me know if you (or somebody else) has more concrete plans on the 
integration of a new CPU. 

I remember some discussions about doing a MIPS / PIC32 port recently
I just stripped down the machine files for a no_cpu machine (all 
fakes), with some documentation about the required units etc. 
Is this based on what we already have for X86, ARM, etc, or does it 
fork to another set of ARC implementations ? If fork, is it intended 
/ viable to move the existing implementations into that scheme ?


An implementation of a new CPU, based on that skeleton, would rise the 
priority for further explorations and documentation.


No idea in what state the structure / documentation of the existing 
fully supported implementations such as x86 and ARM is.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-12 Thread Hans-Peter Diettrich


Michael Schnell schrieb:

I just stripped down the machine files for a no_cpu machine (all 
fakes), with some documentation about the required units etc. 
Is this based on what we already have for X86, ARM, etc, or does it 
fork to another set of ARC implementations ? If fork, is it intended 
/ viable to move the existing implementations into that scheme ?


The no_cpu skeleton was stripped down from the M68K machine, which in 
turn seems to have inherited from the ARM. Due to hard coded 
dependencies it was impossible to remove e.g. registers completely, and 
also a $define of some already known machine must be given, else every 
compilation will fail immediately with an $fatal error.


That skeleton reflects the units, data structures and procedures, that 
are referenced by other parts of the compiler (hard coded). Every 
machine consists of a formal description (registers, instructions...), 
node generators for the parse tree, code (tree) optimizers, assembler 
and output generators for binary code and debug info. A distinct machine 
back-end is selected by adding its source folder to the unit search 
path. This may be the fastest possible implementation for one (of 
multiple) machines, but it doesn't allow to support multiple machine 
back-ends in one program. The same applies to the front-ends, which 
currently are not exchangable at all. More flexibility would require a 
plug-in scheme or similar, hard to do without dynamically loadable 
packages. But since some abstract links already exist (class type 
variables for machine specific descendants), these links could be 
exchanged at runtime, not only in the initialization section of the 
machine specific units. Then it would be sufficient to add all (wanted) 
front- or back-ends to the compiler, and switch amongst these at 
runtime. Where switching the target machine at runtime does not make 
much sense to me, in contrast to switching front-ends based on the 
source file types.



An implementation of a new CPU, based on that skeleton, would rise the 
priority for further explorations and documentation.


No idea in what state the structure / documentation of the existing 
fully supported implementations such as x86 and ARM is.


I could not find much, and most existing documentation is outdated since 
2.0 :-(


Some parts, like the parse tree nodes, are somewhat self-explaining. The 
formal machine descriptions (registers, options...) are almost 
undocumented. I tried to make the construction of the register 
descriptor constants more transparent, by composing them from other sets 
of constants. There seem to exist tools that produce e.g. register 
descriptors (in include files), but I did not yet dig into the tools folder.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-12 Thread Florian Klaempfl

Hans-Peter Diettrich schrieb:
 But since some abstract links already exist (class type
 variables for machine specific descendants), these links could be
 exchanged at runtime, 

One problem are all the used constants describing the target
architecture. We discussed multiple back-ends in one compiler already in
2002 and saw no advantage in it so we didn't try to solve it and we
decided to use the fpc -P ... solution which makes no difference for the
user.

For me, a much higher priority when doing rewrites might be
multithreading nf the compiler itself.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-12 Thread Hans-Peter Diettrich


Florian Klaempfl schrieb:

Hans-Peter Diettrich schrieb:

But since some abstract links already exist (class type
variables for machine specific descendants), these links could be
exchanged at runtime, 


One problem are all the used constants describing the target
architecture. We discussed multiple back-ends in one compiler already in
2002 and saw no advantage in it so we didn't try to solve it and we
decided to use the fpc -P ... solution which makes no difference for the
user.


Full ACK.


For me, a much higher priority when doing rewrites might be
multithreading nf the compiler itself.


That's questionable, depending on the real bottlenecks in compiler 
operation. I suspect that disk I/O is the narrowest bottleneck, that can 
not be widened by parallel processing. It also requires further 
research, for e.g. the determination of the optimal number of threads, 
depending on the currently available resources on a concrete machine.


But of course it's worth a try, to find out more...

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-10 Thread Hans-Peter Diettrich


Michael Schnell schrieb:

In fact I did a (quite low priority) research on how to port FPC to a 
new CPU such as NIOS and Blackfin and found that it of course is doable 
somehow. While NIOS seems to look more doable, as it's quite similar to 
MIPS (and ARM), Blackfin has a much more complex instruction set with a 
huge potential for low-level optimization. Thus I suppose Blackfin is 
quite hard to do.


Let me know if you (or somebody else) has more concrete plans on the 
integration of a new CPU. I just stripped down the machine files for a 
no_cpu machine (all fakes), with some documentation about the required 
units etc.  An implementation of a new CPU, based on that skeleton, 
would rise the priority for further explorations and documentation.


I already have more suggestions for easier implementation of new machines...

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-10 Thread Hans-Peter Diettrich


Jeppe Johansen schrieb:

I would be interested in knowing whether it would be feasible to create 
DSP backends for FPC. I recently had the experience of using a TMS320C26 
which probably has to be programmed in assembler due to the limits of 
the instruction set. But I hear newer DSPs use instruction sets geared 
alot more towards highlevel compilers


This IMO would require new language elements, for parallel/vector 
operations, dedicated libraries, or *very* clever optimizers. A 
general-purpose language like Pascal is not suited for coding DSP 
operations.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-10 Thread Florian Klaempfl


Hans-Peter Diettrich schrieb:

Jeppe Johansen schrieb:

I would be interested in knowing whether it would be feasible to 
create DSP backends for FPC. I recently had the experience of using a 
TMS320C26 which probably has to be programmed in assembler due to the 
limits of the instruction set. But I hear newer DSPs use instruction 
sets geared alot more towards highlevel compilers


This IMO would require new language elements, for parallel/vector 
operations, dedicated libraries, or *very* clever optimizers. A 
general-purpose language like Pascal is not suited for coding DSP 
operations.


FPC has already basic MMX support by allowing operations on appropriate 
arrays so DSP support could be added based on such approach or just 
using intrinsics.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 On 07/09/2010 01:22 PM, ik wrote:


Does FPC capable of supporting blackfin 
http://www.analog.com/en/embedded-processing-dsp/processors/index.html 
(and others in it's family) CPU ?

It seems that more and more embedded projects starting to use it.

No.

In fact I did a (quite low priority) research on how to port FPC to a 
new CPU such as NIOS and Blackfin and found that it of course is doable 
somehow. While NIOS seems to look more doable, as it's quite similar to 
MIPS (and ARM), Blackfin has a much more complex instruction set with a 
huge potential for low-level optimization. Thus I suppose Blackfin is 
quite hard to do.


OTOH I have the impression that the real winner with embedded projects 
will be ARM (especially Cortex) and here Linux is getting interesting 
for even lower-size and higher-Volume projects. So I am shifting my 
interests towards Linux enabled embedded ARM chips (like the TI AM1x and 
AM3x Sitara series that has been introduced in 2010 and feature a RISK 
coprocessor for hard realtime / virtual peripheral stuff).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Nataraj S Narayan

Hi Michael

I too am planning to switch to 'sitara' AM3517 SBCs from at91sam9263.  Hope
there would be an effort to an fpc port to this board on linux-uclibc.

regards

Nataraj

On Fri, Jul 9, 2010 at 5:19 PM, Michael Schnell mschn...@lumino.de wrote:

  On 07/09/2010 01:22 PM, ik wrote:


 Does FPC capable of supporting 
 blackfinhttp://www.analog.com/en/embedded-processing-dsp/processors/index.html(and
  others in it's family) CPU ?
 It seems that more and more embedded projects starting to use it.

 No.

 In fact I did a (quite low priority) research on how to port FPC to a new
 CPU such as NIOS and Blackfin and found that it of course is doable somehow.
 While NIOS seems to look more doable, as it's quite similar to MIPS (and
 ARM), Blackfin has a much more complex instruction set with a huge potential
 for low-level optimization. Thus I suppose Blackfin is quite hard to do.

 OTOH I have the impression that the real winner with embedded projects will
 be ARM (especially Cortex) and here Linux is getting interesting for even
 lower-size and higher-Volume projects. So I am shifting my interests towards
 Linux enabled embedded ARM chips (like the TI AM1x and AM3x Sitara series
 that has been introduced in 2010 and feature a RISK coprocessor for hard
 realtime / virtual peripheral stuff).

 -Michael

 ___
 fpc-devel maillist  -  fpc-devel@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 On 07/09/2010 01:55 PM, Nataraj S Narayan wrote:

Hi Michael

I too am planning to switch to 'sitara' AM3517 SBCs from at91sam9263.  
Hope there would be an effort to an fpc port to this board on 
linux-uclibc.


I understand that the Cortex A8 which powers the AM3x features the full 
32 bit ARM instruction set plus the 16 Bit Thumb instruction set 
plus several enhancements. Thus code created by FPC/ARM  should just run 
out of the box.


In the Lazarus mailing list, we lately discussed how to special do stuff 
like atomic instructions and Futex without libc binding. Here 
ARM-Linux offers a shared userland page with functions that are always 
provided in a version optimized for the CPU subarch we are running on. 
Using same in the RTL instead of doing our own (non sub-arch optimized 
at runtime) code would be a nice enhancement for the RTL


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Jeppe Johansen

I would be interested in knowing whether it would be feasible to create 
DSP backends for FPC. I recently had the experience of using a TMS320C26 
which probably has to be programmed in assembler due to the limits of 
the instruction set. But I hear newer DSPs use instruction sets geared 
alot more towards highlevel compilers


Michael:
Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL 
already have implementations for ARMv6 instructions(ldrex/strex) which 
is pretty much what the architecture manual wrote as example code


Michael Schnell skrev:

 On 07/09/2010 01:55 PM, Nataraj S Narayan wrote:

Hi Michael

I too am planning to switch to 'sitara' AM3517 SBCs from 
at91sam9263.  Hope there would be an effort to an fpc port to this 
board on linux-uclibc.


I understand that the Cortex A8 which powers the AM3x features the 
full 32 bit ARM instruction set plus the 16 Bit Thumb instruction 
set plus several enhancements. Thus code created by FPC/ARM  should 
just run out of the box.


In the Lazarus mailing list, we lately discussed how to special do 
stuff like atomic instructions and Futex without libc binding. 
Here ARM-Linux offers a shared userland page with functions that are 
always provided in a version optimized for the CPU subarch we are 
running on. Using same in the RTL instead of doing our own (non 
sub-arch optimized at runtime) code would be a nice enhancement for 
the RTL


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 On 07/09/2010 02:36 PM, Jeppe Johansen wrote:
Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL 
already have implementations for ARMv6 instructions(ldrex/strex) which 
is pretty much what the architecture manual wrote as example code
I do know this. But as at compile time the compiler (person) can't know 
on what sub-arch the user (person) will run the program he can't tell 
the compiler (program) for which sub-arch to create the binary. So he 
likely will use the default setting which is ARMv5 and does not use the 
modern atomic-supporting instructions.


That is why the Linux Kernel community provides us with this common page 
providing interlocked (atomic in Linux-language) userland Functions 
automatically optimized appropriately. It would be a real shame not to 
take advantage of this in a Linux environment. I feel using these 
function is really easy and will not even need ASM (just calling a 
function at a fixed address).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 BTW.:
The v5 workaround code not only is less efficient (doing busy 
splinocks), it also creates potential deadlocks in certain cases 
(realtime priority), is not 100% secure and it might be that certain 
chips even fail handling the swap instruction atomicly and thus the code 
will not work  at all.


Thus, IMHO,  basing the userland part of Futex on this is not recommended.

The common Linux functions are guaranteed to work decently (with 
Kernel support if necessary) and always are as fast as possible.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Florian Klaempfl

Jeppe Johansen schrieb:
 I would be interested in knowing whether it would be feasible to create
 DSP backends for FPC. 

As usual: a matter of time.

 I recently had the experience of using a TMS320C26
 which probably has to be programmed in assembler due to the limits of
 the instruction set. But I hear newer DSPs use instruction sets geared
 alot more towards highlevel compilers
 
 Michael:
 Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL
 already have implementations for ARMv6 instructions(ldrex/strex) which
 is pretty much what the architecture manual wrote as example code

Yes, but the code is not selected dynamically: if one compiles for armv5
but runs it on a armv6+, the armv6 doesn't use the new instructions.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Florian Klaempfl

Michael Schnell schrieb:
  BTW.:
 The v5 workaround code not only is less efficient (doing busy
 splinocks), it also creates potential deadlocks in certain cases
 (realtime priority), is not 100% secure and it might be that certain
 chips even fail handling the swap instruction atomicly and thus the code
 will not work  at all.
 
 Thus, IMHO,  basing the userland part of Futex on this is not recommended.
 
 The common Linux functions are guaranteed to work decently (with
 Kernel support if necessary) and always are as fast as possible.

Maybe you should write a patch before you have to repeat this for the
next ten years.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 On 07/09/2010 03:44 PM, Florian Klaempfl wrote:

Maybe you should write a patch before you have to repeat this for the
next ten years.


As already said in the other thread I'll do this as soon as I will have 
the appropriate equipment to test it. Until that point of time I only 
can do research and hope for others to do the implementation. If anybody 
thinks I can be of any further help please let me know.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Jeppe Johansen


Florian Klaempfl skrev:

I recently had the experience of using a TMS320C26
which probably has to be programmed in assembler due to the limits of
the instruction set. But I hear newer DSPs use instruction sets geared
alot more towards highlevel compilers

Michael:
Cortex A8 runs ARMv7A. All the Interlocked* functions in the ARM RTL
already have implementations for ARMv6 instructions(ldrex/strex) which
is pretty much what the architecture manual wrote as example code



Yes, but the code is not selected dynamically: if one compiles for armv5
but runs it on a armv6+, the armv6 doesn't use the new instructions.
True, but do you think anyone does that? :) Most people know what end 
hardware their programs will run on


I don't think we can have support for both in the rtl. I don't even 
think you can do that, since GNU as won't accept ARMv6 instructions if 
you assemble for ARMv5, and throw an error


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Florian Klaempfl

Jeppe Johansen schrieb:
 Yes, but the code is not selected dynamically: if one compiles for armv5
 but runs it on a armv6+, the armv6 doesn't use the new instructions.
 True, but do you think anyone does that? :) Most people know what end
 hardware their programs will run on
 
 I don't think we can have support for both in the rtl. I don't even
 think you can do that, since GNU as won't accept ARMv6 instructions if
 you assemble for ARMv5, and throw an error

Well, you can always assemble for ARMv6 but use only ARMv5 instructions
when running on ARMv5. This is what the rtl does with the PLD
instruction in system.move: a system.move procedure with PLD is only
used when the CPU supports it.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Florian Klaempfl

Florian Klaempfl schrieb:
 Jeppe Johansen schrieb:
 Yes, but the code is not selected dynamically: if one compiles for armv5
 but runs it on a armv6+, the armv6 doesn't use the new instructions.
 True, but do you think anyone does that? :) Most people know what end
 hardware their programs will run on

 I don't think we can have support for both in the rtl. I don't even
 think you can do that, since GNU as won't accept ARMv6 instructions if
 you assemble for ARMv5, and throw an error
 
 Well, you can always assemble for ARMv6 but use only ARMv5 instructions
 when running on ARMv5. This is what the rtl does with the PLD
 instruction in system.move: a system.move procedure with PLD is only
 used when the CPU supports it.

Of course, this aims more towards generic arm-linux programs than
embedded stuff where you usually build your custom rtl anyways.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Michael Schnell


 On 07/09/2010 03:52 PM, Jeppe Johansen wrote:
True, but do you think anyone does that? :) Most people know what end 
hardware their programs will run on


in fact to work decently, the v5 version needs Kernel support (if an 
interrupt is issued while doing the atomic stuff) that only is possible 
by using the Linux provided functions (only available if you really 
compile for Linux, of course).


The v6+ code is a pure userland thingy and thus the RTL implementation 
is fine.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Blackfin support

2010-07-09 Thread Jonas Maebe


Florian Klaempfl wrote on Fri, 09 Jul 2010:


Florian Klaempfl schrieb:

Well, you can always assemble for ARMv6 but use only ARMv5 instructions
when running on ARMv5. This is what the rtl does with the PLD
instruction in system.move: a system.move procedure with PLD is only
used when the CPU supports it.


Of course, this aims more towards generic arm-linux programs than
embedded stuff where you usually build your custom rtl anyways.


At least if you use the VFP, you have to compile all code for the  
correct cpu, because the compiler has to use different versions of the  
load/store multiple vfp registers instructions for pre-ARMv6 and for  
ARMv6 and later in the function prologs.



Jonas


This message was sent using IMP, the Internet Messaging Program.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

RE: [fpc-devel] BlackFin

2007-04-17 Thread Helmut Hartl

  Michael Schnell schrieb:
   Florian, Thanks a lot for discussing this !

Big thanks from me too!

  Coding the compiler part isn't that hard. I can do this, I 
  did the initial arm port within a few weeks. The more 
  annoying part is doing the debugging and find the things 
  being broken.

We have to start a research hardware project end of may, and
are also in the middle between choosing an ARM/FPC way or
a blackfin/non fpc way. This discussions opens a new possibility, which
I would gratly favour. We could help to debug a blackfin port,
and i would also donate some money for a development board, if that
helps.

Helmut

--
Helmut dot hartl at firmos.at

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

RE: [fpc-devel] BlackFin

2007-04-17 Thread Helmut Hartl

 
  
  Big thanks from me too!
  
Coding the compiler part isn't that hard. I can do this, 
  I   did the initial arm port within a few weeks. The more  
   annoying part is doing the debugging and find the things  
   being broken.
  
  We have to start a research hardware project end of may, and 
  are also in the middle between choosing an ARM/FPC way or a 
  blackfin/non fpc way. This discussions opens a new 
  possibility, which I would gratly favour. We could help to 
  debug a blackfin port, and i would also donate some money 
  for a development board, if that helps.
  
  Helmut

Sorry, i forgot to mention a valuable source of information:

http://www.bluetechnix.at/

 
 --
 Helmut dot hartl at firmos.at
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

RE: [fpc-devel] BlackFin

2007-04-17 Thread Johann Glaser

Hi!

 Sorry, i forgot to mention a valuable source of information:
 
 http://www.bluetechnix.at/

Coincidental I'm (partly) working for this company and know their staff
quite well. If you need any assistance or similar I'd be happy to help
out.

Bye
  Hansi


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-17 Thread Michael Schnell




We have to start a research hardware project end of may, and
are also in the middle between choosing an ARM/FPC way or
a blackfin/non fpc way. This discussions opens a new possibility, which
I would gratly favour. 


Great to know that I am not the only one ! :-)

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-17 Thread Michael Schnell




If you're interested, start a wiki page with some information where to
get docs, tools, info about calling conventions etc.

Thanks for the encouragement !

Next month I'll see an FAE who just started supporting the BlackFin 
line. After that I hope I can see a bit clearer.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-16 Thread Michael Schnell


Florian, Thanks a lot for discussing this !

Well, this depends how good one wants to make such a port ...
  
In this case the (first version of the) port does not need to be very 
good (in the sense of creating optimized code), but of course the 
compiler needs to produce correct code that performs what had been 
expressed in Pascal.


Do you think, it could be doable to create a not very optimized compiler 
? I would join the team, but as I'm new to compiler construction, I 
don't think I can start that project by myself.


The beauty of the Blackfin is that is extremely fast and offers an 
excellent price/performance relation. The chip I intend to use comes 
with two CPUs each clocked with 600 MHz. As the Blackfin can do single 
instruction / multiple data (e.g. four 8 bit adds or two 16 bit 
multiply/adds in a cycle per CPU) and provides a zero-overhead looping 
mechanism the performance per CPU is (depending on the Application) like 
a 800 to 1500 MHz ARM. And the dual core chip costs about $20. Of course 
there are several smaller Blackfin chips. So IMHO this is an excellent 
processor for embedded use. And thus an interesting target for an FP port.




As far as I have seen the BlackFin has two cores: an arm like risc core
and a dsp.
  


Not really. It's a DSP (predecessor of the Blackfin was the Shark DSP 
line) that has been enhanced by features necessary for standard CPUs.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-16 Thread Michael Schnell




  r2 = r1 + r3, r4 = dm(i0,m1);  /* addition and memory access */
  
Yep. In my answer to Florian I forgot that (other than ARM) the Blackfin 
can do a calculation and a memory access in a single instruction cycle. 
That explains the much better performance even with standard 
(non-DSP-alike) tasks.

  r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */
  
I did not know yet that it can do two independent 32 bit calculations 
and that it can do 32 bit multiplications. Anyway, even if only two 32 
additions can be done in one instruction cycle this is a big chance for 
optimization.

A totally different topic is the inherent parallel processing of a DSP.
Usually they can utilize several processing units (+, *) and memories
within a single cycle (e.g. see above). Instruction ordering and
interleaving to utilize parallelism is tedious to do by hand and I think
also challenging for a compiler.
  
Maybe a first version could skip the great chances for optimization and 
just do a single operation per instruction cycle.


It should be able to create a working compiler that way.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-16 Thread Johann Glaser

Hi!

Am Montag, den 16.04.2007, 11:57 +0200 schrieb Michael Schnell:
r2 = r1 + r3, r4 = dm(i0,m1);  /* addition and memory access */

 Yep. In my answer to Florian I forgot that (other than ARM) the Blackfin 
 can do a calculation and a memory access in a single instruction cycle. 
 That explains the much better performance even with standard 
 (non-DSP-alike) tasks.
r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */

 I did not know yet that it can do two independent 32 bit calculations 
 and that it can do 32 bit multiplications. Anyway, even if only two 32 
 additions can be done in one instruction cycle this is a big chance for 
 optimization.

The above code is based on an example program for some Shark or
TigerShark DSP, so its likely that the BlackFin has other processing
units. I've written the code just as an example for the algebraic style.

You have to carefully study the structure of the CPU (i.e. processing
units, busses, registers, address calculation, ...) to know what can be
done in parallel. In the example I've looked at there was a line with
4-instructions-in-1-cycle:
  f10 = f2 * f4, f12 = f10 + f12, f2 = dm(i1,m2), f4 = pm(i8,m8);
(ADSP-2106x).

In modern CPUs the parallel utilization of busses and processing units
is state of the art. The ressource allocation and parallelization is
done on the fly during program execution by some smart logic inside the
CPU. When a compiler does optimization for a certain CPU it anticipates
this and sorts the instructions and registers approprately to gain a few
percent more speed.

The beauty of DSPs is that its in the hand of the compiler (or assembler
coder) to do the full optimization.

Bye
  Hansi


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-16 Thread Florian Klaempfl

Michael Schnell schrieb:
 Florian, Thanks a lot for discussing this !
 Well, this depends how good one wants to make such a port ...
   
 In this case the (first version of the) port does not need to be very
 good (in the sense of creating optimized code), but of course the
 compiler needs to produce correct code that performs what had been
 expressed in Pascal.
 
 Do you think, it could be doable to create a not very optimized compiler
 ? I would join the team, but as I'm new to compiler construction, I
 don't think I can start that project by myself.

Coding the compiler part isn't that hard. I can do this, I did the
initial arm port within a few weeks. The more annoying part is doing the
debugging and find the things being broken.

If you're interested, start a wiki page with some information where to
get docs, tools, info about calling conventions etc.

 
 The beauty of the Blackfin is that is extremely fast and offers an
 excellent price/performance relation. The chip I intend to use comes
 with two CPUs each clocked with 600 MHz. As the Blackfin can do single
 instruction / multiple data (e.g. four 8 bit adds or two 16 bit
 multiply/adds in a cycle per CPU) and provides a zero-overhead looping
 mechanism the performance per CPU is (depending on the Application) like
 a 800 to 1500 MHz ARM. And the dual core chip costs about $20. Of course
 there are several smaller Blackfin chips. So IMHO this is an excellent
 processor for embedded use. And thus an interesting target for an FP port.
 
 
 As far as I have seen the BlackFin has two cores: an arm like risc core
 and a dsp.
   
 
 Not really. It's a DSP (predecessor of the Blackfin was the Shark DSP
 line) that has been enhanced by features necessary for standard CPUs.
 
 -Michael
 ___
 fpc-devel maillist  -  fpc-devel@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-13 Thread Michael Schnell




No. Not without pushing a port to it in some way.
  
Thanks. And I suppose doing a port for that processor would be quite a 
lot of work, regarding the ASM code is very strange if you compare it to 
the code of 80x86,  PPC or ARM.


-Michael
___
fpc-devel maillist  -  [EMAIL PROTECTED]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-13 Thread Florian Klaempfl

Michael Schnell schrieb:
 
 No. Not without pushing a port to it in some way.
   
 Thanks. And I suppose doing a port for that processor would be quite a
 lot of work, 

Well, this depends how good one wants to make such a port ...

 regarding the ASM code is very strange if you compare it to
 the code of 80x86,  PPC or ARM.

As far as I have seen the BlackFin has two cores: an arm like risc core
and a dsp.

 
 -Michael
 ___
 fpc-devel maillist  -  [EMAIL PROTECTED]
 http://lists.freepascal.org/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  [EMAIL PROTECTED]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] BlackFin

2007-04-13 Thread Johann Glaser

Hi!
  regarding the ASM code is very strange if you compare it to
  the code of 80x86,  PPC or ARM.
 
 As far as I have seen the BlackFin has two cores: an arm like risc core
 and a dsp.

The BlackFin and other Analog Devices DSPs have an uncommon assembler
syntax. Contrary to well-known mnemonic style like
  add   eax,ecx
they use an algebraic syntax like
  r2 = r1 + r3, r4 = dm(i0,m1);  /* addition and memory access */
  r3 = r2 * r4, r1 = r2 + r4;/* multiplication and addition */

Regarding the implementation for a compiler this is no principal
problem, because the compiler has an internal representation of what it
wants to do, and this can be equally expressed in (=transformed to)
mnemonic and algebraic syntax.

A totally different topic is the inherent parallel processing of a DSP.
Usually they can utilize several processing units (+, *) and memories
within a single cycle (e.g. see above). Instruction ordering and
interleaving to utilize parallelism is tedious to do by hand and I think
also challenging for a compiler.

Bye
  Hansi


___
fpc-devel maillist  -  [EMAIL PROTECTED]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

63 matches

Mail list logo