Re: std.d.lexer requirements

Michel Fortin Thu, 02 Aug 2012 11:20:57 -0700

On 2012-08-02 12:28:03 +0000, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:

Regarding the problem at hand, it's becoming painfully obvious to methat the lexer MUST do its own decoding internally.

That's not a great surprise to me. I hit the same issues when writingmy XML parser, which is why I invented functions called frontUnit andpopFrontUnit. I'm glad you're realizing this.

Hence, a very simple thing to do is have the entire lexer only dealwith ranges of ubyte. If someone passes a char[], the lexer's front endcan simply call s.representation and obtain the underlying ubyte[].

That's ugly, but it could work (assuming s.representation returns thecasted range by ref). I still prefer my frontUnit and popFrontUnitapproach though.

In fact, any parser for which speed is important will have to bypassstd.range's clever handling of UTF characters. Dealing simply withubytes isn't enough, since in some cases you'll want to fire up the UTFdecoder.

The next issue, which I haven's seen discussed here is that for aparser to be efficient it should operate on buffers. You can make itwork with arbitrary ranges, but if you don't have a buffer you canslice when you need to preserve a string, you're going to have to buildthe string character by character, which is not efficient at all. Butthen you can only really return slices if the underlying representationis the same as the output representation, and unless your API has atemplated output type, you're going to special case a lot of things.

After having attempted an XML parser with ranges, I'm not sure parsingusing generic ranges can be made very efficient. Automatic conversionto UTF-32 is a nuisance for performance, and if the output needs toreturn parts of the input, you'll need to create an inefficient specialcase just to allocate many new strings in the correct format.


I wonder how your call with Walter will turn out.

--
Michel Fortin
michel.for...@michelf.ca
http://michelf.ca/

Re: std.d.lexer requirements

Reply via email to