I have not tried this, but I have thought the same thing, and I do intend
to give it a go. Its very similar to the relational database stuff I did in
Haskell a while back, where a DSL builds an AST from relational algebra
'combinators' and then a code generator outputs SQL.
The problem I can see for performance is that you want to statically
generate and resolve the AST, so that:
'a', 'b', 'c' | 'a', 'b', 'd'
a
|
b
/ \
c d
accept('a') && accept('b')
if (accept('c')) {
...
} else if (accept('d')) {
...
}
What you don't want is to loop over the syntax tree at runtime. You
effectively have to build a compile time code AST and code generator
metaprogram. It should be possible in something like C++, but it seems
quite a challenge.
Keean.
On 13 May 2015 at 15:38, Jonathan S. Shapiro <[email protected]> wrote:
> So I've been digging in to FParsec, and actually looking at the API
> somewhat carefully to wrap my head around what it does. I have to say that
> I'm more impressed by combinators (from an engineering perspective) than I
> expected to be. If one assumes a suitable degree of optimization from the
> compiler, it's clearly possible to do relatively high performance parsing
> using a library like this. There are a few places where I think I would
> make different choices than FParsec did, mainly down at the tokenization
> level, but I'm not sure my choices are actually better rather than
> different.
>
> The main thing that still concerns me about combinators is the conflict
> between deterministic choice and potential parse ambiguities. I have the
> same concern about PEG parsers.
>
> By deterministic choice, I mean that the parser conceptually tries
> productions/matching in some particular order specified by the developer.
> This has two very nice consequences:
>
> 1. Pre-existing ambiguities in the underlying language, if any, can be
> resolved by suitable organization of the parser.
> 2. Parsers can be composed with unambiguous results.
>
> But neither of these is a goal when designing a new language. We don't
> need to compose parsers, and we'd like to ensure that parse ambiguities
> don't exist in the first place. That is: we'd like the grammar to be a
> context-free grammar (CFG).
>
> It's not that there's any particular magic to CFGs, nor that there is
> anything inherently wrong with context-dependence that is introduced
> intentionally. The concern is that we don't want to introduce context
> dependence *unintentionally*. Context dependence requires that we specify
> the order of application of the grammar productions in the language
> specification. This is one more thing to specify (and therefore one more
> thing to possibly get wrong). Oblivious context dependence results in
> specification failure. Finally, there is a concern about language
> evolution. If a new production is introduced, it's very important to
> understand whether it "masks" behavior of old productions.
>
> Because they adopt deterministic choice, PEG parsers and combinator
> parsers suppress discovery of *unintentional* context dependence.
>
> It has occurred to me that this can be resolved, and I'm wondering if
> anyone has already considered this approach somewhere.
>
> The construction of an [F]Parsec parser effectively performs compositions
> of sub-parsers, with leaf parsers for tokens that are implemented using
> what amount to built-in primitives. The "parser" object, or type, In the
> case of FParsec, the Parser type is defined as:
>
> type Parser<'TResult, 'TUserState> = CharStream<'TUserState> ->
> Reply<'TResult>
>
> But there is nothing magical about this definition. While the developer *can*
> introduce new compositing operators, they typically do not do so. If we
> take the list of compositing operators as non-extensible, then it appears
> to me that we could implement an alternative version of the library in
> which (a) the Parser type is defined as an AST node, and (b) the
> compositors combine their input ASTs into a suitable combining output AST.
> Once an AST for the input grammar is in hand, we could test it against
> various parse algorithms to determine whether it is context-free. Once we
> *know* it is context-free, it's perfectly fine to implement it using
> recursive descent or any other sensible implementation.
>
>
> Has anybody actually tried this?
>
>
>
> Jonathan
>
> _______________________________________________
> bitc-dev mailing list
> [email protected]
> http://www.coyotos.org/mailman/listinfo/bitc-dev
>
>
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev