Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-25 Thread Martin Nowak
On Mon, 18 Jun 2012 19:53:43 +0200, Walter Bright  
 wrote:



On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and  
you've

fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files  
didn't have a measurable effect most of the time.


Lexing is definitely taking a big part of debug compilation time.
I haven't profiled the compiler for some time now but here are some  
thoughts.


- speeding up the identifier hash table
  there was always a profile spike at StringTable::lookup, though it  
reduced

  since you increased the bucket count

- memory mapping the source file saves a copy for UTF-8 sources
  this is by far the fastest way to read a source file

- parallel reading/parsing doesn't help much if most of the source files  
are

  read during import semantic

I'm regularly hitting other bottle necks so I don't think that lexing is  
#1.
When compiling std.range with unittests for example more that 50% of the  
compile time
is spend to check for existing template instantiations using O(N^2)/2  
compares of template arguments.
If we managed to fix http://d.puremagic.com/issues/show_bug.cgi?id=7469 we  
could efficiently use

the mangled name as key.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-25 Thread Steven Schveighoffer
On Mon, 18 Jun 2012 13:53:43 -0400, Walter Bright  
 wrote:



On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and  
you've

fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files  
didn't have a measurable effect most of the time.


I have found that my project, which has a huge number of symbols (And  
large ones) compiles much slower than I would expect.  Perhaps you have  
forgotten about this issue:


http://d.puremagic.com/issues/show_bug.cgi?id=4900

Maybe fixing this still doesn't help parsing, not sure.

-Steve


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread deadalnix

Le 17/06/2012 00:41, Walter Bright a écrit :

On 6/14/2012 11:58 PM, Don Clugston wrote:

And we're well set up for parallel compilation. There's no shortage of
things we
can do to improve compilation time.


The language is carefully designed, so that at least in theory all the
passes could be done in parallel. I've got the file reads in parallel,
but I'd love to have the lexing, parsing, semantic, optimization, and
code gen all done in parallel. Wouldn't that be awesome!


Using di files for speed seems a bit like jettisoning the cargo to
keep the ship
afloat. It works but you only do it when you've got no other options.


.di files don't make a whole lotta sense for small files, but the bigger
they get, the more they are useful. D needs to be scalable to enormous
project sizes.


The key point is project size here. I wouldn't expect file size to 
increase in an important manner.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread deadalnix

Le 18/06/2012 19:53, Walter Bright a écrit :

On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and
you've
fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files
didn't have a measurable effect most of the time.


It is kind of religious. We need data.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread Iain Buclaw
On 16 June 2012 22:17, Guillaume Chatelet  wrote:
>> So parsing time has taken quite a hit since I last did any reports on
>> compilation speed of building phobos.
>
> So maybe my post about "keeping import clean" wasn't as irrelevant as I
> thought.
>
> http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890
>
> --
> Guillaume


I think it's relevancy is only geared towards projects that are
compiling one file at a time - ie: I'd expect all gdc users to be
compiling in this way as whole program compilation using gdc still
needs some rigourous testing first.  If there is a particular large
module, or set of large modules that are persistantly being
importanted, then you will see a notable constant slowdown on
compilation of each file.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread deadalnix

Le 16/06/2012 11:18, Iain Buclaw a écrit :

On 13 June 2012 12:47, Iain Buclaw  wrote:

On 13 June 2012 12:33, Kagamin  wrote:

On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:


The measurements should be done for modules being imported, not the module
being compiled.
Something like this.
---
import std.algorithm;
import std.stdio;
import std.typecons;
import std.datetime;

int ok;
---



Oh and let it import .d files, not .di


std.datetime is one reason for me to run it again. I can imagine that
*that* module will have an impact on parse times.  But I'm still
persistent that the majority of the compile time in the frontend is
done in the first semantic pass, and not the read/parser stage. :~)




Rebuilt a compile log with latest gdc as of writing on the 2.059
frontend / library.

http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


Notes about it:
- GCC has 4 new time counters
   -  phase setup  (time spent loading the compile time environment)
   -  phase parsing  (time spent in the frontend)
   -  phase generate (time spent in the backend)
   -  phase finalize  (time spent cleaning up and exiting)

- Of the phase parsing stage, it is broken down into 5 components
   -  Module::parse
   -  Module::semantic
   -  Module::semantic2
   -  Module::semantic3
   -  Module::genobjfile

- Module::read, Module::parse and Module::importAll in the one I did 2
years ago are now counted as part of just the one parsing stage,
rather than separate just to make it a little bit more balanced. :-)


I'll post a tl;dr later on it.



Thank you very much for your work.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread dennis luehring

Am 18.06.2012 19:53, schrieb Walter Bright:

On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and you've
fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files didn't
have a measurable effect most of the time.


so you started you lexing, parsing in seperated threads for each file - 
where was synchronization needed, have you measured what parts of the 
code makes it like synchron reading - or is it the file reading itself?






Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread dennis luehring

Am 19.06.2012 09:43, schrieb Kagamin:

On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:

Yeah, but I can't escape that lingering feeling that lexing is
slow.

I was fairly disappointed that asynchronously reading the
source files didn't have a measurable effect most of the time.



I don't even understand all this rage about asynchronicity, if
the program has nothing to do until it reads the data,


the lexing and parsing process can be asynchron - i will be faster on 
multiple cores because there is no dependency between seperated 
lexing-parsing threads - why to lex/parse in sequence then?



asynchronicity won't help you in the slightest. Anyway everything
is stuck while the device performs DMA.


yea down to the hardware level - but there are caches etc. out there - 
its not like "multithreaded-file-reading-is-always-fast-like-synchron", 
and also not "asynchron-file-reading-is-always-faster" - more somewere 
in between :)





Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread Kagamin

On Tuesday, 19 June 2012 at 01:47:27 UTC, Timon Gehr wrote:
Parsing is not a huge issue. Depending on how powerful the 
language is, auto-completion may depend on full code analysis.


Yep, pegged runs at compile time.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-19 Thread Kagamin

On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
Yeah, but I can't escape that lingering feeling that lexing is 
slow.


I was fairly disappointed that asynchronously reading the 
source files didn't have a measurable effect most of the time.


I don't even understand all this rage about asynchronicity, if 
the program has nothing to do until it reads the data, 
asynchronicity won't help you in the slightest. Anyway everything 
is stuck while the device performs DMA.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-18 Thread Timon Gehr

On 06/19/2012 02:47 AM, Chris Cain wrote:

On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:

Same here, I wish there were a standardized pre-lexed-token "binary"
file-format, would benefit all text editors also, as they need to lex
it anyway to perform color syntax highlighting.


If I were to make my own language, I'd forego a human-readable format
and just have the "language" be defined as a big machine-readable AST.


http://de.wikipedia.org/wiki/Lisp ?


You'd have to have an IDE, but it could display the code in just about
any way the person wants (syntax, style, etc).



This could be done even if the language's source code storage format is 
human-readable.



Syntax highlighting would be instantaneous and there would be fewer
errors made by programmers (maybe ...). Plus it'd be unbelievably easy
to implement things like auto-completion.


Parsing is not a huge issue. Depending on how powerful the language is, 
auto-completion may depend on full code analysis.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-18 Thread Chris Cain

On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:
Same here, I wish there were a standardized pre-lexed-token 
"binary" file-format, would benefit all text editors also, as 
they need to lex it anyway to perform color syntax highlighting.


If I were to make my own language, I'd forego a human-readable 
format and just have the "language" be defined as a big 
machine-readable AST. You'd have to have an IDE, but it could 
display the code in just about any way the person wants (syntax, 
style, etc).


Syntax highlighting would be instantaneous and there would be 
fewer errors made by programmers (maybe ...). Plus it'd be 
unbelievably easy to implement things like auto-completion.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-18 Thread Daniel

On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:

On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.
Iain's data indicates that it's only a few % of the time 
taken on

semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently 
slow, and you've

fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is 
slow.


I was fairly disappointed that asynchronously reading the 
source files didn't have a measurable effect most of the time.


Same here, I wish there were a standardized pre-lexed-token 
"binary" file-format, would benefit all text editors also, as 
they need to lex it anyway to perform color syntax highlighting.




Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-18 Thread Walter Bright

On 6/18/2012 6:07 AM, Don Clugston wrote:

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and you've
fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files didn't 
have a measurable effect most of the time.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-18 Thread Don Clugston

On 17/06/12 00:37, Walter Bright wrote:

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.


But you argued in your blog that C++ parsing is inherently slow, and 
you've fixed those problems in the design of D.

And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.


Yes, it is designed so you could just import a symbol table. It is done
as source code, however, because it's trivial to implement.


It has those nasty side-effects listed under (3) though.


I don't think they're nasty or are side effects.


They are new problems which people ask for solutions for. And they are 
far more difficult to solve than the original problem.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-16 Thread Walter Bright

On 6/14/2012 1:03 AM, Don Clugston wrote:

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on semantic1().
Do you have data that shows otherwise?


Nothing recent, it's mostly from my C++ compiler testing.



Yes, it is designed so you could just import a symbol table. It is done
as source code, however, because it's trivial to implement.


It has those nasty side-effects listed under (3) though.


I don't think they're nasty or are side effects.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-16 Thread Walter Bright

On 6/14/2012 11:58 PM, Don Clugston wrote:

And we're well set up for parallel compilation. There's no shortage of things we
can do to improve compilation time.


The language is carefully designed, so that at least in theory all the passes 
could be done in parallel. I've got the file reads in parallel, but I'd love to 
have the lexing, parsing, semantic, optimization, and code gen all done in 
parallel. Wouldn't that be awesome!



Using di files for speed seems a bit like jettisoning the cargo to keep the ship
afloat. It works but you only do it when you've got no other options.


.di files don't make a whole lotta sense for small files, but the bigger they 
get, the more they are useful. D needs to be scalable to enormous project sizes.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-16 Thread Guillaume Chatelet
> So parsing time has taken quite a hit since I last did any reports on
> compilation speed of building phobos.

So maybe my post about "keeping import clean" wasn't as irrelevant as I
thought.

http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890

--
Guillaume


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-16 Thread Iain Buclaw
On 16 June 2012 10:18, Iain Buclaw  wrote:
> On 13 June 2012 12:47, Iain Buclaw  wrote:
>> On 13 June 2012 12:33, Kagamin  wrote:
>>> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:

 The measurements should be done for modules being imported, not the module
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---
>>>
>>>
>>> Oh and let it import .d files, not .di
>>
>> std.datetime is one reason for me to run it again. I can imagine that
>> *that* module will have an impact on parse times.  But I'm still
>> persistent that the majority of the compile time in the frontend is
>> done in the first semantic pass, and not the read/parser stage. :~)
>>
>>
>
> Rebuilt a compile log with latest gdc as of writing on the 2.059
> frontend / library.
>
> http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
> http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf
>
>
> Notes about it:
> - GCC has 4 new time counters
>  -  phase setup  (time spent loading the compile time environment)
>  -  phase parsing  (time spent in the frontend)
>  -  phase generate (time spent in the backend)
>  -  phase finalize  (time spent cleaning up and exiting)
>
> - Of the phase parsing stage, it is broken down into 5 components
>  -  Module::parse
>  -  Module::semantic
>  -  Module::semantic2
>  -  Module::semantic3
>  -  Module::genobjfile
>
> - Module::read, Module::parse and Module::importAll in the one I did 2
> years ago are now counted as part of just the one parsing stage,
> rather than separate just to make it a little bit more balanced. :-)
>
>
> I'll post a tl;dr later on it.
>

tl;dr

Total number of source files compiled: 207
Total time to build druntime and phobos:  78.08 seconds
Time spent parsing: 17.15 seconds
Average time spent parsing: 0.08 seconds
Time spent running semantic passes: 10.04 seconds

Time spent generating backend AST: 2.15 seconds
Time spent in backend: 48.62 seconds


So parsing time has taken quite a hit since I last did any reports on
compilation speed of building phobos.  I suspect most of that comes
from the loading of symbols from all imports and that there have been
some large additions to phobos recently which provide a constant
bottle neck if one was to choose compiling one source at a time.  As
the apparent large amount of time spent parsing sources does not show
when compiling all at once.

 Module::parse: 0.58 seconds (1%)
 Module::semantic: 0.24 seconds (1%)
 Module::semantic2: 0.01 seconds (0%)
 Module::semantic3: 2.85 seconds (6%)
 Module::genobjfile: 1.24 seconds ( 3%)
 TOTAL: 47.06 seconds

Considering that the entire phobos library is some 165K lines of code,
I don't see why people aren't laughing about just how quick the
frontend is at parsing. :~)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-16 Thread Iain Buclaw
On 13 June 2012 12:47, Iain Buclaw  wrote:
> On 13 June 2012 12:33, Kagamin  wrote:
>> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
>>>
>>> The measurements should be done for modules being imported, not the module
>>> being compiled.
>>> Something like this.
>>> ---
>>> import std.algorithm;
>>> import std.stdio;
>>> import std.typecons;
>>> import std.datetime;
>>>
>>> int ok;
>>> ---
>>
>>
>> Oh and let it import .d files, not .di
>
> std.datetime is one reason for me to run it again. I can imagine that
> *that* module will have an impact on parse times.  But I'm still
> persistent that the majority of the compile time in the frontend is
> done in the first semantic pass, and not the read/parser stage. :~)
>
>

Rebuilt a compile log with latest gdc as of writing on the 2.059
frontend / library.

http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


Notes about it:
- GCC has 4 new time counters
  -  phase setup  (time spent loading the compile time environment)
  -  phase parsing  (time spent in the frontend)
  -  phase generate (time spent in the backend)
  -  phase finalize  (time spent cleaning up and exiting)

- Of the phase parsing stage, it is broken down into 5 components
  -  Module::parse
  -  Module::semantic
  -  Module::semantic2
  -  Module::semantic3
  -  Module::genobjfile

- Module::read, Module::parse and Module::importAll in the one I did 2
years ago are now counted as part of just the one parsing stage,
rather than separate just to make it a little bit more balanced. :-)


I'll post a tl;dr later on it.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-15 Thread Jonathan M Davis
On Friday, June 15, 2012 08:58:55 Don Clugston wrote:
> I don't think Phobos should use .di files at all. I don't think there
> are any cases where we want to conceal code.
> 
> The performance benefit you would get is completely negligible. It
> doesn't even reduce the number of files that need to be loaded, just the
> length of each one.
> 
> I think that, for example, improving the way that array literals are
> dealt with would have at least as much impact on compilation time.
> For the DMD backend, fixing up the treatment of comma expressions would
> have a much bigger impact than getting lexing and parsing time to zero.
> 
> And we're well set up for parallel compilation. There's no shortage of
> things we can do to improve compilation time.
> 
> Using di files for speed seems a bit like jettisoning the cargo to keep
> the ship afloat. It works but you only do it when you've got no other
> options.

On several occasions, Walter has expressed the desire to make Phobos use .di 
files like druntime does, otherwise I probably would never have considered it. 
Personally, I don't want to bother with it unless there's a large benefit from 
it, so if we're sure that the gain is minimal, then I say that we should just 
leave it all as .d files. Most of of Phobos would have to have its 
implementation left in any .di files anyway so that inlining and CTFE could 
work.

- Jonathan M Davis


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-15 Thread Don Clugston

On 14/06/12 10:10, Jonathan M Davis wrote:

On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:

On 13/06/12 16:29, Walter Bright wrote:

On 6/13/2012 1:07 AM, Don Clugston wrote:

On 12/06/12 18:46, Walter Bright wrote:

On 6/12/2012 2:07 AM, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable


(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.


I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation
time? Has
anyone done some solid profiling?


It is for debug builds.


Iain's data indicates that it's only a few % of the time taken on
semantic1().
Do you have data that shows otherwise?

It seems to me, that slow parsing is a C++ problem which D already solved.


If this is the case, is there any value at all to using .di files in druntime
or Phobos other than in cases where we're specifically trying to hide
implementation (e.g. with the GC)? Or do we still end up paying the semantic
cost for importing the .d files such that using .di files would still help with
compilation times?

- Jonathan M Davis


I don't think Phobos should use .di files at all. I don't think there 
are any cases where we want to conceal code.


The performance benefit you would get is completely negligible. It 
doesn't even reduce the number of files that need to be loaded, just the 
length of each one.


I think that, for example, improving the way that array literals are 
dealt with would have at least as much impact on compilation time.
For the DMD backend, fixing up the treatment of comma expressions would 
have a much bigger impact than getting lexing and parsing time to zero.


And we're well set up for parallel compilation. There's no shortage of 
things we can do to improve compilation time.


Using di files for speed seems a bit like jettisoning the cargo to keep 
the ship afloat. It works but you only do it when you've got no other 
options.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-14 Thread Kagamin

On Thursday, 14 June 2012 at 08:11:02 UTC, Jonathan M Davis wrote:

Or do we still end up paying the semantic
cost for importing the .d files such that using .di files would 
still help with

compilation times?


Oh, right, the module can use mixins and CTFE, so it should be 
semantically checked, but the semantic check may be minimal just 
like in the case of a .di file.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-14 Thread Jonathan M Davis
On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
> On 13/06/12 16:29, Walter Bright wrote:
> > On 6/13/2012 1:07 AM, Don Clugston wrote:
> >> On 12/06/12 18:46, Walter Bright wrote:
> >>> On 6/12/2012 2:07 AM, timotheecour wrote:
>  There's a current pull request to improve di file generation
>  (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>  suggest
>  further ideas.
>  As far as I understand, di interface files try to achieve these
>  conflicting goals:
>  
>  1) speed up compilation by avoiding having to reparse large files over
>  and over.
>  2) hide implementation details for proprietary reasons
>  3) still maintain source code in some form to allow inlining and CTFE
>  4) be human readable
> >>> 
> >>> (4) was not a goal.
> >>> 
> >>> A .di file could very well be a binary file, but making it look like D
> >>> source enabled them to be loaded with no additional implementation work
> >>> in the compiler.
> >> 
> >> I don't understand (1) actually.
> >> 
> >> For two reasons:
> >> (a) Is lexing + parsing really a significant part of the compilation
> >> time? Has
> >> anyone done some solid profiling?
> > 
> > It is for debug builds.
> 
> Iain's data indicates that it's only a few % of the time taken on
> semantic1().
> Do you have data that shows otherwise?
> 
> It seems to me, that slow parsing is a C++ problem which D already solved.

If this is the case, is there any value at all to using .di files in druntime 
or Phobos other than in cases where we're specifically trying to hide 
implementation (e.g. with the GC)? Or do we still end up paying the semantic 
cost for importing the .d files such that using .di files would still help with 
compilation times?

- Jonathan M Davis


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-14 Thread Don Clugston

On 13/06/12 16:29, Walter Bright wrote:

On 6/13/2012 1:07 AM, Don Clugston wrote:

On 12/06/12 18:46, Walter Bright wrote:

On 6/12/2012 2:07 AM, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable


(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.


I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation
time? Has
anyone done some solid profiling?


It is for debug builds.


Iain's data indicates that it's only a few % of the time taken on 
semantic1().

Do you have data that shows otherwise?

It seems to me, that slow parsing is a C++ problem which D already solved.




(b) Wasn't one of the goals of D's module system supposed to be that
you could
import a symbol table? Why not just implement that? Seems like that
would be
much faster than .di files can ever be.


Yes, it is designed so you could just import a symbol table. It is done
as source code, however, because it's trivial to implement.


It has those nasty side-effects listed under (3) though.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Jacob Carlborg

On 2012-06-13 13:47, Iain Buclaw wrote:


std.datetime is one reason for me to run it again. I can imagine that
*that* module will have an impact on parse times.  But I'm still
persistent that the majority of the compile time in the frontend is
done in the first semantic pass, and not the read/parser stage. :~)


You should try the Objective-C/D bridge, that took quite a while to 
compile. Although it will probably not compile any more, haven't been 
update. I think it was only for D1 as well. I think that was most 
templates so I guess that would mean the some of the semantic passes.


--
/Jacob Carlborg


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Walter Bright

On 6/13/2012 1:07 AM, Don Clugston wrote:

On 12/06/12 18:46, Walter Bright wrote:

On 6/12/2012 2:07 AM, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable


(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.


I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation time? Has
anyone done some solid profiling?


It is for debug builds.



(b) Wasn't one of the goals of D's module system supposed to be that you could
import a symbol table? Why not just implement that? Seems like that would be
much faster than .di files can ever be.


Yes, it is designed so you could just import a symbol table. It is done as 
source code, however, because it's trivial to implement.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Kagamin

On Wednesday, 13 June 2012 at 11:47:31 UTC, Iain Buclaw wrote:
std.datetime is one reason for me to run it again. I can 
imagine that

*that* module will have an impact on parse times.  But I'm still
persistent that the majority of the compile time in the 
frontend is
done in the first semantic pass, and not the read/parser stage. 
:~)


Probably. Also test with -fsyntax-only is it works and runs 
semantic passes.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Iain Buclaw
On 13 June 2012 12:33, Kagamin  wrote:
> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
>>
>> The measurements should be done for modules being imported, not the module
>> being compiled.
>> Something like this.
>> ---
>> import std.algorithm;
>> import std.stdio;
>> import std.typecons;
>> import std.datetime;
>>
>> int ok;
>> ---
>
>
> Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that
*that* module will have an impact on parse times.  But I'm still
persistent that the majority of the compile time in the frontend is
done in the first semantic pass, and not the read/parser stage. :~)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Kagamin

On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
The measurements should be done for modules being imported, not 
the module being compiled.

Something like this.
---
import std.algorithm;
import std.stdio;
import std.typecons;
import std.datetime;

int ok;
---


Oh and let it import .d files, not .di


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Kagamin
The measurements should be done for modules being imported, not 
the module being compiled.

Something like this.
---
import std.algorithm;
import std.stdio;
import std.typecons;
import std.datetime;

int ok;
---


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Dmitry Olshansky

On 13.06.2012 14:16, Iain Buclaw wrote:

On 13 June 2012 10:45, Dmitry Olshansky  wrote:

On 13.06.2012 13:37, Iain Buclaw wrote:


On 13 June 2012 09:07, Don Clugstonwrote:


On 12/06/12 18:46, Walter Bright wrote:



On 6/12/2012 2:07 AM, timotheecour wrote:



There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable




(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.




I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation
time?
Has anyone done some solid profiling?



Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.

http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)



Is time spent on I/O accounted for in the parse step? And where is the rest
spent :)



It would be, the counter starts before the files are even touched, and
ends after they are closed.


Ok, then parsing is indistinguishable from I/O and together are only 
tiny fraction of the whole. Great info, thanks.




The rest of the time spent is in the GCC backend, going through the
some 60+ code passes and outputting the assembly to file.



Damn, I like DMD :)



--
Dmitry Olshansky


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Iain Buclaw
On 13 June 2012 10:45, Dmitry Olshansky  wrote:
> On 13.06.2012 13:37, Iain Buclaw wrote:
>>
>> On 13 June 2012 09:07, Don Clugston  wrote:
>>>
>>> On 12/06/12 18:46, Walter Bright wrote:
>>>

 On 6/12/2012 2:07 AM, timotheecour wrote:
>
>
> There's a current pull request to improve di file generation
> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
> suggest
> further ideas.
> As far as I understand, di interface files try to achieve these
> conflicting goals:
>
> 1) speed up compilation by avoiding having to reparse large files over
> and over.
> 2) hide implementation details for proprietary reasons
> 3) still maintain source code in some form to allow inlining and CTFE
> 4) be human readable



 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.
>>>
>>>
>>>
>>> I don't understand (1) actually.
>>>
>>> For two reasons:
>>> (a) Is lexing + parsing really a significant part of the compilation
>>> time?
>>> Has anyone done some solid profiling?
>>>
>>
>> Lexing and Parsing are miniscule tasks in comparison to the three
>> semantic runs done on the code.
>>
>> I added speed counters into the glue code of GDC some time ago.
>>
>> http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/
>>
>> And here is the relavent report to go with it.
>> http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf
>>
>>
>> Example: std/xml.d
>> Module::parse : 0.01 ( 0%)
>> Module::semantic : 0.50 ( 9%)
>> Module::semantic2 : 0.02 ( 0%)
>> Module::semantic3 : 0.04 ( 1%)
>> Module::genobjfile : 0.10 ( 2%)
>>
>> For the entire time it took to compile the one file (5.22 seconds) -
>> it spent almost 10% of it's time running the first semantic analysis.
>>
>>
>> But that was the D2 frontend / phobos as of September 2010.  I should
>> re-run a report on updated times and draw some comparisons. :~)
>>
>
> Is time spent on I/O accounted for in the parse step? And where is the rest
> spent :)
>

It would be, the counter starts before the files are even touched, and
ends after they are closed.

The rest of the time spent is in the GCC backend, going through the
some 60+ code passes and outputting the assembly to file.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Dmitry Olshansky

On 13.06.2012 13:37, Iain Buclaw wrote:

On 13 June 2012 09:07, Don Clugston  wrote:

On 12/06/12 18:46, Walter Bright wrote:


On 6/12/2012 2:07 AM, timotheecour wrote:


There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable



(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.



I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation time?
Has anyone done some solid profiling?



Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.
http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)



Is time spent on I/O accounted for in the parse step? And where is the 
rest spent :)


--
Dmitry Olshansky


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread deadalnix

Le 13/06/2012 11:37, Iain Buclaw a écrit :

On 13 June 2012 09:07, Don Clugston  wrote:

On 12/06/12 18:46, Walter Bright wrote:


On 6/12/2012 2:07 AM, timotheecour wrote:


There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable



(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.



I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation time?
Has anyone done some solid profiling?



Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.
http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)


Regards


Nice numbers ! It also show that the slowest part is the backend.

Can you get some number on a recent version of D ? And in some different 
D codes (ie, template intensive or not for instance is nice to compare).


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Iain Buclaw
On 13 June 2012 09:07, Don Clugston  wrote:
> On 12/06/12 18:46, Walter Bright wrote:
>>
>> On 6/12/2012 2:07 AM, timotheecour wrote:
>>>
>>> There's a current pull request to improve di file generation
>>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
>>> suggest
>>> further ideas.
>>> As far as I understand, di interface files try to achieve these
>>> conflicting goals:
>>>
>>> 1) speed up compilation by avoiding having to reparse large files over
>>> and over.
>>> 2) hide implementation details for proprietary reasons
>>> 3) still maintain source code in some form to allow inlining and CTFE
>>> 4) be human readable
>>
>>
>> (4) was not a goal.
>>
>> A .di file could very well be a binary file, but making it look like D
>> source enabled them to be loaded with no additional implementation work
>> in the compiler.
>
>
> I don't understand (1) actually.
>
> For two reasons:
> (a) Is lexing + parsing really a significant part of the compilation time?
> Has anyone done some solid profiling?
>

Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.
http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Don Clugston

On 12/06/12 18:46, Walter Bright wrote:

On 6/12/2012 2:07 AM, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest
further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable


(4) was not a goal.

A .di file could very well be a binary file, but making it look like D
source enabled them to be loaded with no additional implementation work
in the compiler.


I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation 
time? Has anyone done some solid profiling?


(b) Wasn't one of the goals of D's module system supposed to be that you 
could import a symbol table? Why not just implement that? Seems like 
that would be much faster than .di files can ever be.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-13 Thread Paulo Pinto

On Tuesday, 12 June 2012 at 12:23:21 UTC, Dmitry Olshansky wrote:

On 12.06.2012 16:09, foobar wrote:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); 
I'd like to

suggest further ideas.
As far as I understand, di interface files try to achieve 
these

conflicting goals:

1) speed up compilation by avoiding having to reparse large 
files over

and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original 
motivation
was only goal (2), but I was fairly new to D at the time 
(2005).


Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), 
first usable

in DMD0.142

Personally I believe that.di files are *totally* the wrong 
approach
for goal (1). I don't think goal (1) and (2) have anything in 
common
at all with each other, except that C tried to achieve both 
of them
using header files. It's an OK solution for (1) in C, it's a 
failure

in C++, and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), 
and stop

pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal 
(4) is an
anti-feature. In order to get a human readable version of the 
API the
programmer should use *documentation*. D claims that one of 
its goals is
to make it a breeze to provide documentation by bundling a 
standard tool
- DDoc. There's no need to duplicate this just to provide 
another format

when DDoc itself supposed to be format agnostic.

Absolutely. DDoc being built-in didn't sound right to me at 
first, BUT it allows us to essentially being able to say that 
APIs are covered in the DDoc generated files. Not header files 
etc.



This is a solved problem since the 80's (E.g. Pascal units).


Right, seeing yet another newbie hit it everyday is a clear 
indication of a simple fact: people would like to think & work 
in modules rather then seeing guts of old and crappy OBJ file 
technology. Linking with C != using C tools everywhere.




Back in the 90's I only moved 100% away from Turbo Pascal into C
land, when I started using Linux at the University and eventually
spent some time doing C++ as well.

It still baffles me, that in 2012 we still need to rely in crappy
C linker tooling, when in the 80's we already had languages with 
proper

modules.

Now we have many mainstream languages with proper modules, but 
many

of them leave in VM land.

Oberon, Go and Delphi/Free Pascal seem to be the only languages 
with native code generation compilers that offer the binary only 
modules solution, while many rely on some form of .di files.






Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Dmitry Olshansky

On 12.06.2012 22:47, Adam Wilson wrote:

On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky
 wrote:


On 12.06.2012 16:09, foobar wrote:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original motivation
was only goal (2), but I was fairly new to D at the time (2005).

Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
in DMD0.142

Personally I believe that.di files are *totally* the wrong approach
for goal (1). I don't think goal (1) and (2) have anything in common
at all with each other, except that C tried to achieve both of them
using header files. It's an OK solution for (1) in C, it's a failure
in C++, and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), and stop
pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal (4) is an
anti-feature. In order to get a human readable version of the API the
programmer should use *documentation*. D claims that one of its goals is
to make it a breeze to provide documentation by bundling a standard tool
- DDoc. There's no need to duplicate this just to provide another format
when DDoc itself supposed to be format agnostic.


Absolutely. DDoc being built-in didn't sound right to me at first, BUT
it allows us to essentially being able to say that APIs are covered in
the DDoc generated files. Not header files etc.


This is a solved problem since the 80's (E.g. Pascal units).


Right, seeing yet another newbie hit it everyday is a clear indication
of a simple fact: people would like to think & work in modules rather
then seeing guts of old and crappy OBJ file technology. Linking with C
!= using C tools everywhere.



I completely agree with this. The interactions between the D module
system and D toolchain are utterly confusing to newcomers, especially
those from other C-like languages. There are better ways, see .NET
Assemblies and Pascal Units. These problems were solved decades ago. Why
are we still using 40-year-old paradigms?


>Per Adam's

post, the issue is tied to DMD's use of OMF/optlink which we all would
like to get rid of anyway. Once we're in proper COFF land, couldn't we
just store the required metadata (binary AST?) in special sections in
the object files themselves?


Seconded. At least lexed form could be very compact, I recall early
compressors tried doing the Huffman thing on source code tokens with a
certain success.



I don't see the value of compression. Lexing would already reduce the
size significantly and compression would only add to processing times.
Disk is cheap.


I/O is not. (De)Compression on the fly is more and more intersecting 
direction these days. The less you read/write the faster you get. 
Knowing beforehand the distribution of keywords relative frequency is a 
boon. Yet I agree that it's premature at the moment.




Beyond that though, this is absolutely the direction D must head in. In
my mind the DI generation patch was mostly just a stop-gap to bring
DI-gen up-to-date with the current system thereby giving us enough time
to tackle the (admittedly huge) task of building COFF into the backend,
emitting the lexed source into a special section and then giving the
compiler *AND* linker the ability to read out the source. For example
the giving the linker the ability to read out source code essentially
requires a brand-new linker. Although, it is my personal opinion that
the linker should be integrated with the compiler and done as one step,
this way the linker could have intimate knowledge of the source and
would enable some spectacular LTO options. If only DMD were written in
D, then we could really open the compile speed throttles with an MT
build model...




--
Dmitry Olshansky


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Adam Wilson
On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky  
 wrote:



On 12.06.2012 16:09, foobar wrote:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original motivation
was only goal (2), but I was fairly new to D at the time (2005).

Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
in DMD0.142

Personally I believe that.di files are *totally* the wrong approach
for goal (1). I don't think goal (1) and (2) have anything in common
at all with each other, except that C tried to achieve both of them
using header files. It's an OK solution for (1) in C, it's a failure
in C++, and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), and stop
pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal (4) is an
anti-feature. In order to get a human readable version of the API the
programmer should use *documentation*. D claims that one of its goals is
to make it a breeze to provide documentation by bundling a standard tool
- DDoc. There's no need to duplicate this just to provide another format
when DDoc itself supposed to be format agnostic.

Absolutely. DDoc being built-in didn't sound right to me at first, BUT  
it allows us to essentially being able to say that APIs are covered in  
the DDoc generated files. Not header files etc.



This is a solved problem since the 80's (E.g. Pascal units).


Right, seeing yet another newbie hit it everyday is a clear indication  
of a simple fact: people would like to think & work in modules rather  
then seeing guts of old and crappy OBJ file technology. Linking with C  
!= using C tools everywhere.




I completely agree with this. The interactions between the D module system  
and D toolchain are utterly confusing to newcomers, especially those from  
other C-like languages. There are better ways, see .NET Assemblies and  
Pascal Units. These problems were solved decades ago. Why are we still  
using 40-year-old paradigms?



 >Per Adam's

post, the issue is tied to DMD's use of OMF/optlink which we all would
like to get rid of anyway. Once we're in proper COFF land, couldn't we
just store the required metadata (binary AST?) in special sections in
the object files themselves?

Seconded. At least lexed form could be very compact, I recall early  
compressors tried doing the Huffman thing on source code tokens with a  
certain success.




I don't see the value of compression. Lexing would already reduce the size  
significantly and compression would only add to processing times. Disk is  
cheap.


Beyond that though, this is absolutely the direction D must head in. In my  
mind the DI generation patch was mostly just a stop-gap to bring DI-gen  
up-to-date with the current system thereby giving us enough time to tackle  
the (admittedly huge) task of building COFF into the backend, emitting the  
lexed source into a special section and then giving the compiler *AND*  
linker the ability to read out the source. For example the giving the  
linker the ability to read out source code essentially requires a  
brand-new linker. Although, it is my personal opinion that the linker  
should be integrated with the compiler and done as one step, this way the  
linker could have intimate knowledge of the source and would enable some  
spectacular LTO options. If only DMD were written in D, then we could  
really open the compile speed throttles with an MT build model...



Another related question - AFAIK the LLVM folks did/are doing work to
make their implementation less platform-depended. Could we leverage this
in ldc to store LLVM bit code as D libs which still retain enough info
for the compiler to replace header files?







--
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Walter Bright

On 6/12/2012 2:07 AM, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest
further ideas.
As far as I understand, di interface files try to achieve these conflicting 
goals:

1) speed up compilation by avoiding having to reparse large files over and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and CTFE
4) be human readable


(4) was not a goal.

A .di file could very well be a binary file, but making it look like D source 
enabled them to be loaded with no additional implementation work in the compiler.





Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Timon Gehr

On 06/12/2012 03:54 PM, deadalnix wrote:

Le 12/06/2012 12:23, Tobias Pankrath a écrit :

Currently .di-files are compiler independent. If this should hold for
dib-files, too, we'll need a standard ast structure, won't we?



We need it anyway at some point.


Plain D code is already a perfectly fine standard AST structure.


 AST macro is another example.



AST macros may refer to AST structures by their representations as D code.


It would also greatly simplify compiler writing if the D interpreter
could be provided as lib (and so run on top of dib file).



I don't think so. Writing the interpreter is a rather straightforward 
part of the compiler implementation. Why would you want to run it on top 
of a '.dib' file anyway? Serializing/deserializing the AST is too much 
overhead.



I want to mention that LLVM IR + metadata can do a really good job here.
In addition, LLVM people are working on a JIT backend, if you know what
I mean ;)


Interpreting manually is not harder than CTFE-compatible LLVM IR code 
generation, but the LLVM JIT could certainly be leveraged to improve 
compilation speeds.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Adam Wilson

On Tue, 12 Jun 2012 06:46:44 -0700, Jacob Carlborg  wrote:


On 2012-06-12 14:09, foobar wrote:


This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
post, the issue is tied to DMD's use of OMF/optlink which we all would
like to get rid of anyway. Once we're in proper COFF land, couldn't we
just store the required metadata (binary AST?) in special sections in
the object files themselves?


Can't the same be done with OMF? I'm not saying I want to keep OMF.



OMF doesn't support Custom Sections and I think a custom section is the  
right way to handle this. I found the Borland OMF docs once a while back  
to verify this.


--
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread deadalnix

Le 12/06/2012 14:39, foobar a écrit :

Another related question - AFAIK the LLVM folks did/are doing work to
make their implementation less platform-depended. Could we leverage this
in ldc to store LLVM bit code as D libs which still retain enough info
for the compiler to replace header files?



LLVM is definitively something I look at more and more. It is a great 
weapon for D IMO.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread deadalnix

Le 12/06/2012 12:23, Tobias Pankrath a écrit :

Currently .di-files are compiler independent. If this should hold for
dib-files, too, we'll need a standard ast structure, won't we?



We need it anyway at some point. AST macro is another example.

It would also greatly simplify compiler writing if the D interpreter 
could be provided as lib (and so run on top of dib file).


I want to mention that LLVM IR + metadata can do a really good job here. 
In addition, LLVM people are working on a JIT backend, if you know what 
I mean ;)


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Jacob Carlborg

On 2012-06-12 14:09, foobar wrote:


This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
post, the issue is tied to DMD's use of OMF/optlink which we all would
like to get rid of anyway. Once we're in proper COFF land, couldn't we
just store the required metadata (binary AST?) in special sections in
the object files themselves?


Can't the same be done with OMF? I'm not saying I want to keep OMF.

--
/Jacob Carlborg


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread foobar

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd 
like to

suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large 
files over

and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original 
motivation was only goal (2), but I was fairly new to D at the 
time (2005).


Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first 
usable in DMD0.142


Personally I believe that.di files are *totally* the wrong 
approach for goal (1). I don't think goal (1) and (2) have 
anything in common at all with each other, except that C tried 
to achieve both of them using header files. It's an OK solution 
for (1) in C, it's a failure in C++, and a complete failure in 
D.


IMHO: If we want goal (1), we should try to achieve goal (1), 
and stop pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal 
(4) is an anti-feature. In order to get a human readable version 
of the API the programmer should use *documentation*. D claims 
that one of its goals is to make it a breeze to provide 
documentation by bundling a standard tool - DDoc. There's no need 
to duplicate this just to provide another format when DDoc itself 
supposed to be format agnostic.


This is a solved problem since the 80's (E.g. Pascal units). Per 
Adam's post, the issue is tied to DMD's use of OMF/optlink which 
we all would like to get rid of anyway. Once we're in proper COFF 
land, couldn't we just store the required metadata (binary AST?) 
in special sections in the object files themselves?


Another related question - AFAIK the LLVM folks did/are doing 
work to make their implementation less platform-depended. Could 
we leverage this in ldc to store LLVM bit code as D libs which 
still retain enough info for the compiler to replace header files?




Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Dmitry Olshansky

On 12.06.2012 16:09, foobar wrote:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original motivation
was only goal (2), but I was fairly new to D at the time (2005).

Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
in DMD0.142

Personally I believe that.di files are *totally* the wrong approach
for goal (1). I don't think goal (1) and (2) have anything in common
at all with each other, except that C tried to achieve both of them
using header files. It's an OK solution for (1) in C, it's a failure
in C++, and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), and stop
pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal (4) is an
anti-feature. In order to get a human readable version of the API the
programmer should use *documentation*. D claims that one of its goals is
to make it a breeze to provide documentation by bundling a standard tool
- DDoc. There's no need to duplicate this just to provide another format
when DDoc itself supposed to be format agnostic.

Absolutely. DDoc being built-in didn't sound right to me at first, BUT 
it allows us to essentially being able to say that APIs are covered in 
the DDoc generated files. Not header files etc.



This is a solved problem since the 80's (E.g. Pascal units).


Right, seeing yet another newbie hit it everyday is a clear indication 
of a simple fact: people would like to think & work in modules rather 
then seeing guts of old and crappy OBJ file technology. Linking with C 
!= using C tools everywhere.


>Per Adam's

post, the issue is tied to DMD's use of OMF/optlink which we all would
like to get rid of anyway. Once we're in proper COFF land, couldn't we
just store the required metadata (binary AST?) in special sections in
the object files themselves?

Seconded. At least lexed form could be very compact, I recall early 
compressors tried doing the Huffman thing on source code tokens with a 
certain success.



Another related question - AFAIK the LLVM folks did/are doing work to
make their implementation less platform-depended. Could we leverage this
in ldc to store LLVM bit code as D libs which still retain enough info
for the compiler to replace header files?




--
Dmitry Olshansky


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread foobar

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd 
like to

suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large 
files over

and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining
and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original 
motivation was only goal (2), but I was fairly new to D at the 
time (2005).


Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first 
usable in DMD0.142


Personally I believe that.di files are *totally* the wrong 
approach for goal (1). I don't think goal (1) and (2) have 
anything in common at all with each other, except that C tried 
to achieve both of them using header files. It's an OK solution 
for (1) in C, it's a failure in C++, and a complete failure in 
D.


IMHO: If we want goal (1), we should try to achieve goal (1), 
and stop pretending its in any way related to goal (2).


I absolutely agree with the above and would also add that goal 
(4) is an anti-feature. In order to get a human readable version 
of the API the programmer should use *documentation*. D claims 
that one of its goals is to make it a breeze to provide 
documentation by bundling a standard tool - DDoc. There's no need 
to duplicate this just to provide another format when DDoc itself 
supposed to be format agnostic.


This is a solved problem since the 80's (E.g. Pascal units). Per 
Adam's post, the issue is tied to DMD's use of OMF/optlink which 
we all would like to get rid of anyway. Once we're in proper COFF 
land, couldn't we just store the required metadata (binary AST?) 
in special sections in the object files themselves?


Another related question - AFAIK the LLVM folks did/are doing 
work to make their implementation less platform-depended. Could 
we leverage this in ldc to store LLVM bit code as D libs which 
still retain enough info for the compiler to replace header files?




Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Don Clugston

On 12/06/12 11:07, timotheecour wrote:

There's a current pull request to improve di file generation
(https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
suggest further ideas.
As far as I understand, di interface files try to achieve these
conflicting goals:

1) speed up compilation by avoiding having to reparse large files over
and over.
2) hide implementation details for proprietary reasons

> 3) still maintain source code in some form to allow inlining and CTFE
> 4) be human readable

Is that actually true? My recollection is that the original motivation 
was only goal (2), but I was fairly new to D at the time (2005).


Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in 
DMD0.142


Personally I believe that.di files are *totally* the wrong approach for 
goal (1). I don't think goal (1) and (2) have anything in common at all 
with each other, except that C tried to achieve both of them using 
header files. It's an OK solution for (1) in C, it's a failure in C++, 
and a complete failure in D.


IMHO: If we want goal (1), we should try to achieve goal (1), and stop 
pretending its in any way related to goal (2).


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Timon Gehr

On 06/12/2012 12:47 PM, Alex Rønne Petersen wrote:

On 12-06-2012 12:23, Tobias Pankrath wrote:

Currently .di-files are compiler independent. If this should hold for
dib-files, too, we'll need a standard ast structure, won't we?



Which is a Good Thing (TM). It would /require/ formalization of the
language once and for all.



I do not see how this conclusion could be reached.


Re: AST files instead of DI interface files for faster compilation and easier distribution

2012-06-12 Thread Tobias Pankrath
Currently .di-files are compiler independent. If this should hold 
for dib-files, too, we'll need a standard ast structure, won't we?