Re: AST files instead of DI interface files for faster compilation and easier distribution
On Mon, 18 Jun 2012 19:53:43 +0200, Walter Bright wrote: On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. Lexing is definitely taking a big part of debug compilation time. I haven't profiled the compiler for some time now but here are some thoughts. - speeding up the identifier hash table there was always a profile spike at StringTable::lookup, though it reduced since you increased the bucket count - memory mapping the source file saves a copy for UTF-8 sources this is by far the fastest way to read a source file - parallel reading/parsing doesn't help much if most of the source files are read during import semantic I'm regularly hitting other bottle necks so I don't think that lexing is #1. When compiling std.range with unittests for example more that 50% of the compile time is spend to check for existing template instantiations using O(N^2)/2 compares of template arguments. If we managed to fix http://d.puremagic.com/issues/show_bug.cgi?id=7469 we could efficiently use the mangled name as key.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Mon, 18 Jun 2012 13:53:43 -0400, Walter Bright wrote: On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. I have found that my project, which has a huge number of symbols (And large ones) compiles much slower than I would expect. Perhaps you have forgotten about this issue: http://d.puremagic.com/issues/show_bug.cgi?id=4900 Maybe fixing this still doesn't help parsing, not sure. -Steve
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 17/06/2012 00:41, Walter Bright a écrit : On 6/14/2012 11:58 PM, Don Clugston wrote: And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time. The language is carefully designed, so that at least in theory all the passes could be done in parallel. I've got the file reads in parallel, but I'd love to have the lexing, parsing, semantic, optimization, and code gen all done in parallel. Wouldn't that be awesome! Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options. .di files don't make a whole lotta sense for small files, but the bigger they get, the more they are useful. D needs to be scalable to enormous project sizes. The key point is project size here. I wouldn't expect file size to increase in an important manner.
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 18/06/2012 19:53, Walter Bright a écrit : On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. It is kind of religious. We need data.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 16 June 2012 22:17, Guillaume Chatelet wrote: >> So parsing time has taken quite a hit since I last did any reports on >> compilation speed of building phobos. > > So maybe my post about "keeping import clean" wasn't as irrelevant as I > thought. > > http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890 > > -- > Guillaume I think it's relevancy is only geared towards projects that are compiling one file at a time - ie: I'd expect all gdc users to be compiling in this way as whole program compilation using gdc still needs some rigourous testing first. If there is a particular large module, or set of large modules that are persistantly being importanted, then you will see a notable constant slowdown on compilation of each file. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 16/06/2012 11:18, Iain Buclaw a écrit : On 13 June 2012 12:47, Iain Buclaw wrote: On 13 June 2012 12:33, Kagamin wrote: On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote: The measurements should be done for modules being imported, not the module being compiled. Something like this. --- import std.algorithm; import std.stdio; import std.typecons; import std.datetime; int ok; --- Oh and let it import .d files, not .di std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~) Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library. http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf Notes about it: - GCC has 4 new time counters - phase setup (time spent loading the compile time environment) - phase parsing (time spent in the frontend) - phase generate (time spent in the backend) - phase finalize (time spent cleaning up and exiting) - Of the phase parsing stage, it is broken down into 5 components - Module::parse - Module::semantic - Module::semantic2 - Module::semantic3 - Module::genobjfile - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-) I'll post a tl;dr later on it. Thank you very much for your work.
Re: AST files instead of DI interface files for faster compilation and easier distribution
Am 18.06.2012 19:53, schrieb Walter Bright: On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. so you started you lexing, parsing in seperated threads for each file - where was synchronization needed, have you measured what parts of the code makes it like synchron reading - or is it the file reading itself?
Re: AST files instead of DI interface files for faster compilation and easier distribution
Am 19.06.2012 09:43, schrieb Kagamin: On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote: Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. I don't even understand all this rage about asynchronicity, if the program has nothing to do until it reads the data, the lexing and parsing process can be asynchron - i will be faster on multiple cores because there is no dependency between seperated lexing-parsing threads - why to lex/parse in sequence then? asynchronicity won't help you in the slightest. Anyway everything is stuck while the device performs DMA. yea down to the hardware level - but there are caches etc. out there - its not like "multithreaded-file-reading-is-always-fast-like-synchron", and also not "asynchron-file-reading-is-always-faster" - more somewere in between :)
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tuesday, 19 June 2012 at 01:47:27 UTC, Timon Gehr wrote: Parsing is not a huge issue. Depending on how powerful the language is, auto-completion may depend on full code analysis. Yep, pegged runs at compile time.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote: Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. I don't even understand all this rage about asynchronicity, if the program has nothing to do until it reads the data, asynchronicity won't help you in the slightest. Anyway everything is stuck while the device performs DMA.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 06/19/2012 02:47 AM, Chris Cain wrote: On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote: Same here, I wish there were a standardized pre-lexed-token "binary" file-format, would benefit all text editors also, as they need to lex it anyway to perform color syntax highlighting. If I were to make my own language, I'd forego a human-readable format and just have the "language" be defined as a big machine-readable AST. http://de.wikipedia.org/wiki/Lisp ? You'd have to have an IDE, but it could display the code in just about any way the person wants (syntax, style, etc). This could be done even if the language's source code storage format is human-readable. Syntax highlighting would be instantaneous and there would be fewer errors made by programmers (maybe ...). Plus it'd be unbelievably easy to implement things like auto-completion. Parsing is not a huge issue. Depending on how powerful the language is, auto-completion may depend on full code analysis.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote: Same here, I wish there were a standardized pre-lexed-token "binary" file-format, would benefit all text editors also, as they need to lex it anyway to perform color syntax highlighting. If I were to make my own language, I'd forego a human-readable format and just have the "language" be defined as a big machine-readable AST. You'd have to have an IDE, but it could display the code in just about any way the person wants (syntax, style, etc). Syntax highlighting would be instantaneous and there would be fewer errors made by programmers (maybe ...). Plus it'd be unbelievably easy to implement things like auto-completion.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote: On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time. Same here, I wish there were a standardized pre-lexed-token "binary" file-format, would benefit all text editors also, as they need to lex it anyway to perform color syntax highlighting.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/18/2012 6:07 AM, Don Clugston wrote: On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 17/06/12 00:37, Walter Bright wrote: On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast. Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement. It has those nasty side-effects listed under (3) though. I don't think they're nasty or are side effects. They are new problems which people ask for solutions for. And they are far more difficult to solve than the original problem.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/14/2012 1:03 AM, Don Clugston wrote: It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? Nothing recent, it's mostly from my C++ compiler testing. Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement. It has those nasty side-effects listed under (3) though. I don't think they're nasty or are side effects.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/14/2012 11:58 PM, Don Clugston wrote: And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time. The language is carefully designed, so that at least in theory all the passes could be done in parallel. I've got the file reads in parallel, but I'd love to have the lexing, parsing, semantic, optimization, and code gen all done in parallel. Wouldn't that be awesome! Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options. .di files don't make a whole lotta sense for small files, but the bigger they get, the more they are useful. D needs to be scalable to enormous project sizes.
Re: AST files instead of DI interface files for faster compilation and easier distribution
> So parsing time has taken quite a hit since I last did any reports on > compilation speed of building phobos. So maybe my post about "keeping import clean" wasn't as irrelevant as I thought. http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890 -- Guillaume
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 16 June 2012 10:18, Iain Buclaw wrote: > On 13 June 2012 12:47, Iain Buclaw wrote: >> On 13 June 2012 12:33, Kagamin wrote: >>> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote: The measurements should be done for modules being imported, not the module being compiled. Something like this. --- import std.algorithm; import std.stdio; import std.typecons; import std.datetime; int ok; --- >>> >>> >>> Oh and let it import .d files, not .di >> >> std.datetime is one reason for me to run it again. I can imagine that >> *that* module will have an impact on parse times. But I'm still >> persistent that the majority of the compile time in the frontend is >> done in the first semantic pass, and not the read/parser stage. :~) >> >> > > Rebuilt a compile log with latest gdc as of writing on the 2.059 > frontend / library. > > http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf > http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf > > > Notes about it: > - GCC has 4 new time counters > - phase setup (time spent loading the compile time environment) > - phase parsing (time spent in the frontend) > - phase generate (time spent in the backend) > - phase finalize (time spent cleaning up and exiting) > > - Of the phase parsing stage, it is broken down into 5 components > - Module::parse > - Module::semantic > - Module::semantic2 > - Module::semantic3 > - Module::genobjfile > > - Module::read, Module::parse and Module::importAll in the one I did 2 > years ago are now counted as part of just the one parsing stage, > rather than separate just to make it a little bit more balanced. :-) > > > I'll post a tl;dr later on it. > tl;dr Total number of source files compiled: 207 Total time to build druntime and phobos: 78.08 seconds Time spent parsing: 17.15 seconds Average time spent parsing: 0.08 seconds Time spent running semantic passes: 10.04 seconds Time spent generating backend AST: 2.15 seconds Time spent in backend: 48.62 seconds So parsing time has taken quite a hit since I last did any reports on compilation speed of building phobos. I suspect most of that comes from the loading of symbols from all imports and that there have been some large additions to phobos recently which provide a constant bottle neck if one was to choose compiling one source at a time. As the apparent large amount of time spent parsing sources does not show when compiling all at once. Module::parse: 0.58 seconds (1%) Module::semantic: 0.24 seconds (1%) Module::semantic2: 0.01 seconds (0%) Module::semantic3: 2.85 seconds (6%) Module::genobjfile: 1.24 seconds ( 3%) TOTAL: 47.06 seconds Considering that the entire phobos library is some 165K lines of code, I don't see why people aren't laughing about just how quick the frontend is at parsing. :~) Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13 June 2012 12:47, Iain Buclaw wrote: > On 13 June 2012 12:33, Kagamin wrote: >> On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote: >>> >>> The measurements should be done for modules being imported, not the module >>> being compiled. >>> Something like this. >>> --- >>> import std.algorithm; >>> import std.stdio; >>> import std.typecons; >>> import std.datetime; >>> >>> int ok; >>> --- >> >> >> Oh and let it import .d files, not .di > > std.datetime is one reason for me to run it again. I can imagine that > *that* module will have an impact on parse times. But I'm still > persistent that the majority of the compile time in the frontend is > done in the first semantic pass, and not the read/parser stage. :~) > > Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library. http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf Notes about it: - GCC has 4 new time counters - phase setup (time spent loading the compile time environment) - phase parsing (time spent in the frontend) - phase generate (time spent in the backend) - phase finalize (time spent cleaning up and exiting) - Of the phase parsing stage, it is broken down into 5 components - Module::parse - Module::semantic - Module::semantic2 - Module::semantic3 - Module::genobjfile - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-) I'll post a tl;dr later on it. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Friday, June 15, 2012 08:58:55 Don Clugston wrote: > I don't think Phobos should use .di files at all. I don't think there > are any cases where we want to conceal code. > > The performance benefit you would get is completely negligible. It > doesn't even reduce the number of files that need to be loaded, just the > length of each one. > > I think that, for example, improving the way that array literals are > dealt with would have at least as much impact on compilation time. > For the DMD backend, fixing up the treatment of comma expressions would > have a much bigger impact than getting lexing and parsing time to zero. > > And we're well set up for parallel compilation. There's no shortage of > things we can do to improve compilation time. > > Using di files for speed seems a bit like jettisoning the cargo to keep > the ship afloat. It works but you only do it when you've got no other > options. On several occasions, Walter has expressed the desire to make Phobos use .di files like druntime does, otherwise I probably would never have considered it. Personally, I don't want to bother with it unless there's a large benefit from it, so if we're sure that the gain is minimal, then I say that we should just leave it all as .d files. Most of of Phobos would have to have its implementation left in any .di files anyway so that inlining and CTFE could work. - Jonathan M Davis
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 14/06/12 10:10, Jonathan M Davis wrote: On Thursday, June 14, 2012 10:03:05 Don Clugston wrote: On 13/06/12 16:29, Walter Bright wrote: On 6/13/2012 1:07 AM, Don Clugston wrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? It seems to me, that slow parsing is a C++ problem which D already solved. If this is the case, is there any value at all to using .di files in druntime or Phobos other than in cases where we're specifically trying to hide implementation (e.g. with the GC)? Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times? - Jonathan M Davis I don't think Phobos should use .di files at all. I don't think there are any cases where we want to conceal code. The performance benefit you would get is completely negligible. It doesn't even reduce the number of files that need to be loaded, just the length of each one. I think that, for example, improving the way that array literals are dealt with would have at least as much impact on compilation time. For the DMD backend, fixing up the treatment of comma expressions would have a much bigger impact than getting lexing and parsing time to zero. And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time. Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Thursday, 14 June 2012 at 08:11:02 UTC, Jonathan M Davis wrote: Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times? Oh, right, the module can use mixins and CTFE, so it should be semantically checked, but the semantic check may be minimal just like in the case of a .di file.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Thursday, June 14, 2012 10:03:05 Don Clugston wrote: > On 13/06/12 16:29, Walter Bright wrote: > > On 6/13/2012 1:07 AM, Don Clugston wrote: > >> On 12/06/12 18:46, Walter Bright wrote: > >>> On 6/12/2012 2:07 AM, timotheecour wrote: > There's a current pull request to improve di file generation > (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to > suggest > further ideas. > As far as I understand, di interface files try to achieve these > conflicting goals: > > 1) speed up compilation by avoiding having to reparse large files over > and over. > 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable > >>> > >>> (4) was not a goal. > >>> > >>> A .di file could very well be a binary file, but making it look like D > >>> source enabled them to be loaded with no additional implementation work > >>> in the compiler. > >> > >> I don't understand (1) actually. > >> > >> For two reasons: > >> (a) Is lexing + parsing really a significant part of the compilation > >> time? Has > >> anyone done some solid profiling? > > > > It is for debug builds. > > Iain's data indicates that it's only a few % of the time taken on > semantic1(). > Do you have data that shows otherwise? > > It seems to me, that slow parsing is a C++ problem which D already solved. If this is the case, is there any value at all to using .di files in druntime or Phobos other than in cases where we're specifically trying to hide implementation (e.g. with the GC)? Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times? - Jonathan M Davis
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13/06/12 16:29, Walter Bright wrote: On 6/13/2012 1:07 AM, Don Clugston wrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? It is for debug builds. Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? It seems to me, that slow parsing is a C++ problem which D already solved. (b) Wasn't one of the goals of D's module system supposed to be that you could import a symbol table? Why not just implement that? Seems like that would be much faster than .di files can ever be. Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement. It has those nasty side-effects listed under (3) though.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 2012-06-13 13:47, Iain Buclaw wrote: std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~) You should try the Objective-C/D bridge, that took quite a while to compile. Although it will probably not compile any more, haven't been update. I think it was only for D1 as well. I think that was most templates so I guess that would mean the some of the semantic passes. -- /Jacob Carlborg
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/13/2012 1:07 AM, Don Clugston wrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? It is for debug builds. (b) Wasn't one of the goals of D's module system supposed to be that you could import a symbol table? Why not just implement that? Seems like that would be much faster than .di files can ever be. Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Wednesday, 13 June 2012 at 11:47:31 UTC, Iain Buclaw wrote: std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~) Probably. Also test with -fsyntax-only is it works and runs semantic passes.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13 June 2012 12:33, Kagamin wrote: > On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote: >> >> The measurements should be done for modules being imported, not the module >> being compiled. >> Something like this. >> --- >> import std.algorithm; >> import std.stdio; >> import std.typecons; >> import std.datetime; >> >> int ok; >> --- > > > Oh and let it import .d files, not .di std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote: The measurements should be done for modules being imported, not the module being compiled. Something like this. --- import std.algorithm; import std.stdio; import std.typecons; import std.datetime; int ok; --- Oh and let it import .d files, not .di
Re: AST files instead of DI interface files for faster compilation and easier distribution
The measurements should be done for modules being imported, not the module being compiled. Something like this. --- import std.algorithm; import std.stdio; import std.typecons; import std.datetime; int ok; ---
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13.06.2012 14:16, Iain Buclaw wrote: On 13 June 2012 10:45, Dmitry Olshansky wrote: On 13.06.2012 13:37, Iain Buclaw wrote: On 13 June 2012 09:07, Don Clugstonwrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Is time spent on I/O accounted for in the parse step? And where is the rest spent :) It would be, the counter starts before the files are even touched, and ends after they are closed. Ok, then parsing is indistinguishable from I/O and together are only tiny fraction of the whole. Great info, thanks. The rest of the time spent is in the GCC backend, going through the some 60+ code passes and outputting the assembly to file. Damn, I like DMD :) -- Dmitry Olshansky
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13 June 2012 10:45, Dmitry Olshansky wrote: > On 13.06.2012 13:37, Iain Buclaw wrote: >> >> On 13 June 2012 09:07, Don Clugston wrote: >>> >>> On 12/06/12 18:46, Walter Bright wrote: >>> On 6/12/2012 2:07 AM, timotheecour wrote: > > > There's a current pull request to improve di file generation > (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to > suggest > further ideas. > As far as I understand, di interface files try to achieve these > conflicting goals: > > 1) speed up compilation by avoiding having to reparse large files over > and over. > 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. >>> >>> >>> >>> I don't understand (1) actually. >>> >>> For two reasons: >>> (a) Is lexing + parsing really a significant part of the compilation >>> time? >>> Has anyone done some solid profiling? >>> >> >> Lexing and Parsing are miniscule tasks in comparison to the three >> semantic runs done on the code. >> >> I added speed counters into the glue code of GDC some time ago. >> >> http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ >> >> And here is the relavent report to go with it. >> http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf >> >> >> Example: std/xml.d >> Module::parse : 0.01 ( 0%) >> Module::semantic : 0.50 ( 9%) >> Module::semantic2 : 0.02 ( 0%) >> Module::semantic3 : 0.04 ( 1%) >> Module::genobjfile : 0.10 ( 2%) >> >> For the entire time it took to compile the one file (5.22 seconds) - >> it spent almost 10% of it's time running the first semantic analysis. >> >> >> But that was the D2 frontend / phobos as of September 2010. I should >> re-run a report on updated times and draw some comparisons. :~) >> > > Is time spent on I/O accounted for in the parse step? And where is the rest > spent :) > It would be, the counter starts before the files are even touched, and ends after they are closed. The rest of the time spent is in the GCC backend, going through the some 60+ code passes and outputting the assembly to file. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13.06.2012 13:37, Iain Buclaw wrote: On 13 June 2012 09:07, Don Clugston wrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Is time spent on I/O accounted for in the parse step? And where is the rest spent :) -- Dmitry Olshansky
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 13/06/2012 11:37, Iain Buclaw a écrit : On 13 June 2012 09:07, Don Clugston wrote: On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Regards Nice numbers ! It also show that the slowest part is the backend. Can you get some number on a recent version of D ? And in some different D codes (ie, template intensive or not for instance is nice to compare).
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 13 June 2012 09:07, Don Clugston wrote: > On 12/06/12 18:46, Walter Bright wrote: >> >> On 6/12/2012 2:07 AM, timotheecour wrote: >>> >>> There's a current pull request to improve di file generation >>> (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to >>> suggest >>> further ideas. >>> As far as I understand, di interface files try to achieve these >>> conflicting goals: >>> >>> 1) speed up compilation by avoiding having to reparse large files over >>> and over. >>> 2) hide implementation details for proprietary reasons >>> 3) still maintain source code in some form to allow inlining and CTFE >>> 4) be human readable >> >> >> (4) was not a goal. >> >> A .di file could very well be a binary file, but making it look like D >> source enabled them to be loaded with no additional implementation work >> in the compiler. > > > I don't understand (1) actually. > > For two reasons: > (a) Is lexing + parsing really a significant part of the compilation time? > Has anyone done some solid profiling? > Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12/06/12 18:46, Walter Bright wrote: On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler. I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? (b) Wasn't one of the goals of D's module system supposed to be that you could import a symbol table? Why not just implement that? Seems like that would be much faster than .di files can ever be.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tuesday, 12 June 2012 at 12:23:21 UTC, Dmitry Olshansky wrote: On 12.06.2012 16:09, foobar wrote: On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. Absolutely. DDoc being built-in didn't sound right to me at first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc. This is a solved problem since the 80's (E.g. Pascal units). Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere. Back in the 90's I only moved 100% away from Turbo Pascal into C land, when I started using Linux at the University and eventually spent some time doing C++ as well. It still baffles me, that in 2012 we still need to rely in crappy C linker tooling, when in the 80's we already had languages with proper modules. Now we have many mainstream languages with proper modules, but many of them leave in VM land. Oberon, Go and Delphi/Free Pascal seem to be the only languages with native code generation compilers that offer the binary only modules solution, while many rely on some form of .di files.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12.06.2012 22:47, Adam Wilson wrote: On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky wrote: On 12.06.2012 16:09, foobar wrote: On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. Absolutely. DDoc being built-in didn't sound right to me at first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc. This is a solved problem since the 80's (E.g. Pascal units). Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere. I completely agree with this. The interactions between the D module system and D toolchain are utterly confusing to newcomers, especially those from other C-like languages. There are better ways, see .NET Assemblies and Pascal Units. These problems were solved decades ago. Why are we still using 40-year-old paradigms? >Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Seconded. At least lexed form could be very compact, I recall early compressors tried doing the Huffman thing on source code tokens with a certain success. I don't see the value of compression. Lexing would already reduce the size significantly and compression would only add to processing times. Disk is cheap. I/O is not. (De)Compression on the fly is more and more intersecting direction these days. The less you read/write the faster you get. Knowing beforehand the distribution of keywords relative frequency is a boon. Yet I agree that it's premature at the moment. Beyond that though, this is absolutely the direction D must head in. In my mind the DI generation patch was mostly just a stop-gap to bring DI-gen up-to-date with the current system thereby giving us enough time to tackle the (admittedly huge) task of building COFF into the backend, emitting the lexed source into a special section and then giving the compiler *AND* linker the ability to read out the source. For example the giving the linker the ability to read out source code essentially requires a brand-new linker. Although, it is my personal opinion that the linker should be integrated with the compiler and done as one step, this way the linker could have intimate knowledge of the source and would enable some spectacular LTO options. If only DMD were written in D, then we could really open the compile speed throttles with an MT build model... -- Dmitry Olshansky
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky wrote: On 12.06.2012 16:09, foobar wrote: On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. Absolutely. DDoc being built-in didn't sound right to me at first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc. This is a solved problem since the 80's (E.g. Pascal units). Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere. I completely agree with this. The interactions between the D module system and D toolchain are utterly confusing to newcomers, especially those from other C-like languages. There are better ways, see .NET Assemblies and Pascal Units. These problems were solved decades ago. Why are we still using 40-year-old paradigms? >Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Seconded. At least lexed form could be very compact, I recall early compressors tried doing the Huffman thing on source code tokens with a certain success. I don't see the value of compression. Lexing would already reduce the size significantly and compression would only add to processing times. Disk is cheap. Beyond that though, this is absolutely the direction D must head in. In my mind the DI generation patch was mostly just a stop-gap to bring DI-gen up-to-date with the current system thereby giving us enough time to tackle the (admittedly huge) task of building COFF into the backend, emitting the lexed source into a special section and then giving the compiler *AND* linker the ability to read out the source. For example the giving the linker the ability to read out source code essentially requires a brand-new linker. Although, it is my personal opinion that the linker should be integrated with the compiler and done as one step, this way the linker could have intimate knowledge of the source and would enable some spectacular LTO options. If only DMD were written in D, then we could really open the compile speed throttles with an MT build model... Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files? -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 6/12/2012 2:07 AM, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons 3) still maintain source code in some form to allow inlining and CTFE 4) be human readable (4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 06/12/2012 03:54 PM, deadalnix wrote: Le 12/06/2012 12:23, Tobias Pankrath a écrit : Currently .di-files are compiler independent. If this should hold for dib-files, too, we'll need a standard ast structure, won't we? We need it anyway at some point. Plain D code is already a perfectly fine standard AST structure. AST macro is another example. AST macros may refer to AST structures by their representations as D code. It would also greatly simplify compiler writing if the D interpreter could be provided as lib (and so run on top of dib file). I don't think so. Writing the interpreter is a rather straightforward part of the compiler implementation. Why would you want to run it on top of a '.dib' file anyway? Serializing/deserializing the AST is too much overhead. I want to mention that LLVM IR + metadata can do a really good job here. In addition, LLVM people are working on a JIT backend, if you know what I mean ;) Interpreting manually is not harder than CTFE-compatible LLVM IR code generation, but the LLVM JIT could certainly be leveraged to improve compilation speeds.
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tue, 12 Jun 2012 06:46:44 -0700, Jacob Carlborg wrote: On 2012-06-12 14:09, foobar wrote: This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Can't the same be done with OMF? I'm not saying I want to keep OMF. OMF doesn't support Custom Sections and I think a custom section is the right way to handle this. I found the Borland OMF docs once a while back to verify this. -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 12/06/2012 14:39, foobar a écrit : Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files? LLVM is definitively something I look at more and more. It is a great weapon for D IMO.
Re: AST files instead of DI interface files for faster compilation and easier distribution
Le 12/06/2012 12:23, Tobias Pankrath a écrit : Currently .di-files are compiler independent. If this should hold for dib-files, too, we'll need a standard ast structure, won't we? We need it anyway at some point. AST macro is another example. It would also greatly simplify compiler writing if the D interpreter could be provided as lib (and so run on top of dib file). I want to mention that LLVM IR + metadata can do a really good job here. In addition, LLVM people are working on a JIT backend, if you know what I mean ;)
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 2012-06-12 14:09, foobar wrote: This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Can't the same be done with OMF? I'm not saying I want to keep OMF. -- /Jacob Carlborg
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12.06.2012 16:09, foobar wrote: On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. Absolutely. DDoc being built-in didn't sound right to me at first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc. This is a solved problem since the 80's (E.g. Pascal units). Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere. >Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Seconded. At least lexed form could be very compact, I recall early compressors tried doing the Huffman thing on source code tokens with a certain success. Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files? -- Dmitry Olshansky
Re: AST files instead of DI interface files for faster compilation and easier distribution
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote: On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2). I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 12/06/12 11:07, timotheecour wrote: There's a current pull request to improve di file generation (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest further ideas. As far as I understand, di interface files try to achieve these conflicting goals: 1) speed up compilation by avoiding having to reparse large files over and over. 2) hide implementation details for proprietary reasons > 3) still maintain source code in some form to allow inlining and CTFE > 4) be human readable Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).
Re: AST files instead of DI interface files for faster compilation and easier distribution
On 06/12/2012 12:47 PM, Alex Rønne Petersen wrote: On 12-06-2012 12:23, Tobias Pankrath wrote: Currently .di-files are compiler independent. If this should hold for dib-files, too, we'll need a standard ast structure, won't we? Which is a Good Thing (TM). It would /require/ formalization of the language once and for all. I do not see how this conclusion could be reached.
Re: AST files instead of DI interface files for faster compilation and easier distribution
Currently .di-files are compiler independent. If this should hold for dib-files, too, we'll need a standard ast structure, won't we?