On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:
I added dstrings because
1- at the time (a few months ago), the lists here were awash in
UTF-32
discussions and I thought that'd be the way to go anyway
2- other D parsing libraries seemed to go to UTF32 also (CTPG)
3- I wanted to be able to parse mathematical notation like
nabla,
derivatives, etc. which all have UTF32 symbols.
I propose to switch code to use S if(isSomeString!S) everywhere.
Client code would first determine source encoding scheme, and
then instantiate parsers specifying a string type. This is not a
trivial change, but I'm willing to help implementing it.
Note that PEG does not impose to use packrat parsing, even
though it was developed to use it. I think it's a historical
'accident' that put the two together: Bryan Ford thesis used
the two together.
Note that many PEG parsers do not rely on packrat (Pegged does
not).
There are a bunch of articles on Bryan Ford's website by a guy
writting a PEG parser for Java, and who found that storing the
last rules was enought to get a slight speed improvement, buth
that doing anymore sotrage was detrimental to the parser's
overall efficiency.
That's great! Anyway I want to understand the advantages and
limitations of both Pegged and ANTLR, and probably study some
more techniques. Such research consumes a lot of time but can be
done incrementally along with development.