Re: DCT: D compiler as a collection of libraries

Roman D. Boiko Sun, 20 May 2012 01:13:43 -0700

On Saturday, 19 May 2012 at 20:35:10 UTC, Marco Leise wrote:

Am Fri, 11 May 2012 10:01:28 +0200
schrieb "Roman D. Boiko" <r...@d-coding.com>:
There were several discussions about the need for a D compilerlibrary.
I propose my draft implementation of lexer for communityreview:
https://github.com/roman-d-boiko/dct

(I decided to comment on both my post and your reply.)

I've got *a lot* of feedback from community, which caused asignificant redesign (but caused a delay with yet unknownduration). I'll write more on that later. Thanks a lot toeverybody!

Lexer is based on Brian Schott's projecthttps://github.com/Hackerpilot/Dscanner, but it has beenrefactored and extended (and more changes are on the way).

Looks like I'm going to replace this implementation almostcompletely.

The goal is to have source code loading, lexer, parser andsemantic analysis available as parts of Phobos. Theselibraries should be designed to be usable in multiplescenarios (e.g., refactoring, code analysis, etc.).

See below.

My commitment is to have at least front end built this year(and conforming to the D2 specification unless explicitlystated otherwise for some particular aspect).

By the D specification I meanhttp://dlang.org/language-reference.html, or (when specificationis not clear for me) one of the following:

* TDPL,
* current DMD implementation
* community feedback.

(Note that I may assume that I understand some aspect, but laterrevisit it if needed.)

Please post any feed here. A dedicated project web-site willbe created later.

It is going to be http://d-coding.com, but it is not usable yet(I have not much web-design practise). It is hosted onhttps://github.com/roman-d-boiko/roman-d-boiko.github.com, pullrequests are welcome.

A general purpose D front-end library has been discussedseveral times. So at least for some day-dreaming about betterIDE support and code analysis tools it has a huge value. It isgood to see that someone finally takes the time to implementone. My only concern is that not enough planing went into thedesign.

Could you name a few specific concerns? The reason of this threadwas to reflect on design early. OTOH, I didn't spend timedocumenting my design goals and tradeoffs, so discussion turnedinto a brainstorm and wasn't always productive. But now when Isee how much value can I get even without doing my homework, I'mmuch more likely to provide better documentation and ask morespecific questions.

I think about things like brainstorming and collecting
possible use cases from the community or looking at how someC++ tools do their job and what infrastructure they are builton. Although it is really difficult to tell from other people'scode which decision was 'important' and what was just theauthor's way to do it.

I'm going to pick up several use cases and prioritize themaccording to my judgement. Feel free to suggest any cases thatyou think are needed (with motivation). Prioritizing is necessaryto define what is out of scope and plan work into milestones, inorder to ensure the project is feasible.

Inclusion into Phobos I would not see as a priority. AsJonathan said, there are already some clear visions of how suchmodules would look like.

Well, *considering* such inclusion seriously would help improvethe quality of project. But it is not what necessarily willhappen. Currently I think that my goals are close to those of usecase 2 from Jonathan's reply. But until the project is reasonablycomplete it is not the time to argue whether to include it (orits part).

Also if any project seeks to replace the DMD front-end, Waltershould be the technical advisor. Like anyone putting a lot oftime and effort into a design, he could have strong feelingsabout certain decisions and implementing them in a seeminglyworse way.

Not a goal currently, because that would make the projectsignificantly less likely to be completed ever.

That said, you make the impression of being really dedicated tothe project, even giving yourself a time line, which is a goodsign. I wish you all the best for the project and hope that -even without any 'official' recognition - it will see a lot ofuse as the base of D tools.

Well, I hope that some more people will join the project once itstabilizes and delivers something useful.


By the way, is anybody *potentially* interested to join?

To learn about parsing I wrote a syntax highlighter for theDCPU-16 assembly of Minecraft author Markus Persson. (Which wasa good exercise.) Interestingly I ended up with a similardesign for the Token struct like yours:
- separate array for line # lookup
- TokenType/TokenKind enum
- Trie for matching token kinds (why do you expand it intonested switch-case through CTFE mixins though?)

Switch code generation is something temporary that workscurrently and helps me evaluate possible problems with futuredesign, which is planned to be compile-time finite automata(likely deterministic).

Since assembly code is usually small I just preallocate anarray of sourceCode.length tokens and realloc it to the correctsize when I'm done parsing. Nothing pretty, but simple and I amsure it won't get any faster ;).

I'm sure it will :) (I'm going to elaborate on this some timelater).

One notable difference is that I only check for isEoF in onelocation, since I append "\0" to the source code as a stoptoken (that can also serve as an end-of-line indicator). (Thegeneral idea is to move checks outside of loops.)

There are several EoF conditions: \0, \x1A, __EOF__ and physicasEOF. And any loop would need to check for all. Preallocationcould eliminate the need to check for physical EoF, but wouldmake it impossible to avoid input string copying and also wouldnot allow passing a function a slice of string to be lexed.

** Errors  **
I generally keep the start and end column, in case someonewants that. A real world example:
  ubyte x = ...;
  if (x >= 0 && x <= 8) ...
  Line 2, warning: Comparison is always true.
At first glace you say "No, a byte can become larger than 8.",but the compiler is just complaining about the first part ofthe expression here, which would be clarified by e.g. someASCII/Unicode art:
  Line 2, warning: Comparison is always true:
  if (x >= 0 && x <= 8) ...
      ^----^

This functionality has been the priority from the beginning. Notimplemented yet but designed for. Evaluation of column and lineonly on demand is caused by the assumption that such informationis needed rarely (primarily to display information to the user).My new design also addresses the use cases when it is neededfrequently.

** Code highlighting **
Especially Gtk's TextView (which GtkSourceView is base on)isn't made for several thousand tokens. The text view will takea second per 20000 tokens or so. The source view works aroundthat by highlighting in chunks, but you can still fell thedelay when you jump to the end of the file in gedit and evenMonoDevelop suffers from the bad performance. If I understoodyour posts correctly, you are already planning for minimalchange sets. It is *very* important to only update changedparts in a syntax highlighting editor. So for now I ended upwriting a 'insertText' and 'deleteText' method for the parserthat take 'start index', 'text' and 'start index', 'end index'respectively and return a list of removed and added tokens.If possible, make sure that your library can be used withGtkSourceView, Scintilla and QSyntaxHighlighter. Unfortunatelythe bindings for the last two could use an update, but the APIhelp should get you an idea of how to use it most efficiently.

Curretly I plan to support GtkSourceView, Scintilla, and a customeditor API which I plan to define so that it would be mostefficient in this respect. Didn't work on that thoroughly yet.

Incremental changes are the key to efficiency, and I'm going toinvest a lot of effort into making them. Also immutability ofdata structures will enable many optimizations.

Re: DCT: D compiler as a collection of libraries

Reply via email to