Am Fri, 11 May 2012 10:01:28 +0200
schrieb "Roman D. Boiko" <r...@d-coding.com>:

> There were several discussions about the need for a D compiler 
> library.
> 
> I propose my draft implementation of lexer for community review:
> https://github.com/roman-d-boiko/dct
> 
> Lexer is based on Brian Schott's project 
> https://github.com/Hackerpilot/Dscanner, but it has been 
> refactored and extended (and more changes are on the way).
> 
> The goal is to have source code loading, lexer, parser and 
> semantic analysis available as parts of Phobos. These libraries 
> should be designed to be usable in multiple scenarios (e.g., 
> refactoring, code analysis, etc.).
> 
> My commitment is to have at least front end built this year (and 
> conforming to the D2 specification unless explicitly stated 
> otherwise for some particular aspect).
> 
> Please post any feed here. A dedicated project web-site will be 
> created later.

A general purpose D front-end library has been discussed several times. So at 
least for some day-dreaming about better IDE support and code analysis tools it 
has a huge value. It is good to see that someone finally takes the time to 
implement one. My only concern is that not enough planing went into the design. 
I think about things like brainstorming and collecting possible use cases from 
the community or looking at how some C++ tools do their job and what 
infrastructure they are built on. Although it is really difficult to tell from 
other people's code which decision was 'important' and what was just the 
author's way to do it.

Inclusion into Phobos I would not see as a priority. As Jonathan said, there 
are already some clear visions of how such modules would look like. Also if any 
project seeks to replace the DMD front-end, Walter should be the technical 
advisor. Like anyone putting a lot of time and effort into a design, he could 
have strong feelings about certain decisions and implementing them in a 
seemingly worse way.
That said, you make the impression of being really dedicated to the project, 
even giving yourself a time line, which is a good sign. I wish you all the best 
for the project and hope that - even without any 'official' recognition - it 
will see a lot of use as the base of D tools.

To learn about parsing I wrote a syntax highlighter for the DCPU-16 assembly of 
Minecraft author Markus Persson. (Which was a good exercise.) Interestingly I 
ended up with a similar design for the Token struct like yours:
- separate array for line # lookup
- TokenType/TokenKind enum
- Trie for matching token kinds (why do you expand it into nested switch-case 
through CTFE mixins though?)
Since assembly code is usually small I just preallocate an array of 
sourceCode.length tokens and realloc it to the correct size when I'm done 
parsing. Nothing pretty, but simple and I am sure it won't get any faster ;).
One notable difference is that I only check for isEoF in one location, since I 
append "\0" to the source code as a stop token (that can also serve as an 
end-of-line indicator). (The general idea is to move checks outside of loops.)

** Errors  **
I generally keep the start and end column, in case someone wants that. A real 
world example:

  ubyte x = ...;
  if (x >= 0 && x <= 8) ...
  Line 2, warning: Comparison is always true.

At first glace you say "No, a byte can become larger than 8.", but the compiler 
is just complaining about the first part of the expression here, which would be 
clarified by e.g. some ASCII/Unicode art:

  Line 2, warning: Comparison is always true:
  if (x >= 0 && x <= 8) ...
      ^----^

** Code highlighting **
Especially Gtk's TextView (which GtkSourceView is base on) isn't made for 
several thousand tokens. The text view will take a second per 20000 tokens or 
so. The source view works around that by highlighting in chunks, but you can 
still fell the delay when you jump to the end of the file in gedit and even 
MonoDevelop suffers from the bad performance. If I understood your posts 
correctly, you are already planning for minimal change sets. It is *very* 
important to only update changed parts in a syntax highlighting editor. So for 
now I ended up writing a 'insertText' and 'deleteText' method for the parser 
that take 'start index', 'text' and 'start index', 'end index' respectively and 
return a list of removed and added tokens.
If possible, make sure that your library can be used with GtkSourceView, 
Scintilla and QSyntaxHighlighter. Unfortunately the bindings for the last two 
could use an update, but the API help should get you an idea of how to use it 
most efficiently.

-- 
Marco

Reply via email to