On 8/2/12 6:07 AM, Walter Bright wrote:
Why? I've never seen any UTF16 or UTF32 D source in the wild.

Here's a crazy idea that I'll hang to this one remark. No, two crazy ideas.

First, after having read the large back-and-forth Jonathan/Walter in one sitting, it's becoming obvious to me you'll never understand each other on this nontrivial matter through this medium. I suggest you set up a skype/phone call. Once you get past the first 30 seconds of social awkwardness of hearing each other's voice, you'll make fantastic progress in communicating.

Regarding the problem at hand, it's becoming painfully obvious to me that the lexer MUST do its own decoding internally. Hence, a very simple thing to do is have the entire lexer only deal with ranges of ubyte. If someone passes a char[], the lexer's front end can simply call s.representation and obtain the underlying ubyte[].

If someone passes some range of char, the lexer uses an adapter (e.g. map()) that casts every char to ubyte, which is a zero-cost operation. Then it uses the same core operating on ranges of ubyte.

In the first implementation, the lexer may actually refuse any range of 16-bit or 32-bit elements (wchar[], range of wchar, dchar[], range of dchar). Later on the core may be evolved to handle range of ushort and range of dchar. The front-end would use again representation() against wchar[], cast with range of wchar, and would just pass the dchar[] and range of dchar around.

This makes the core simple and efficient (I think Jonathan's use of static if and mixins everywhere, while well-intended, complicates matters without necessity).

And as such we have a lexer! Which operates with ranges, just has a simple front-end clarifying that the lexer must do its own decoding.

Works?


Andrei

Reply via email to