On 8/2/12 2:17 PM, Michel Fortin wrote:
On 2012-08-02 12:28:03 +0000, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> said:
Hence, a very simple thing to do is have the entire lexer only deal
with ranges of ubyte. If someone passes a char[], the lexer's front
end can simply call s.representation and obtain the underlying ubyte[].

That's ugly, but it could work (assuming s.representation returns the
casted range by ref). I still prefer my frontUnit and popFrontUnit
approach though.

I agree frontUnit and popFrontUnit are more generic because they allow other ranges to define them.

In fact, any parser for which speed is important will have to bypass
std.range's clever handling of UTF characters. Dealing simply with
ubytes isn't enough, since in some cases you'll want to fire up the UTF
decoder.

The next issue, which I haven's seen discussed here is that for a parser
to be efficient it should operate on buffers. You can make it work with
arbitrary ranges, but if you don't have a buffer you can slice when you
need to preserve a string, you're going to have to build the string
character by character, which is not efficient at all. But then you can
only really return slices if the underlying representation is the same
as the output representation, and unless your API has a templated output
type, you're going to special case a lot of things.

I think a BufferedRange could go a long way here.

After having attempted an XML parser with ranges, I'm not sure parsing
using generic ranges can be made very efficient. Automatic conversion to
UTF-32 is a nuisance for performance, and if the output needs to return
parts of the input, you'll need to create an inefficient special case
just to allocate many new strings in the correct format.

I'm not so sure, but I'll measure.

I wonder how your call with Walter will turn out.

What call?


Thanks,

Andrei

Reply via email to