On 2014-09-20 02:27PM, Loup Vaillant-David wrote: >In any case, they should come soon. First I'll finish the >recogniser (I have to correct the empty rules bug using Aycock & >Horsepool (that's easy), and optimise right recursive rules using >Leo (haven't figured that out yet).
Cool. I'll look forward to it. >Actually, you don't need the back pointers. Plain Earley items are >enough. Even better, you don't need all the items. You only need the >completed ones. Sure, it's just a classic space/time tradeoff. The recognizer knows how it is deriving each match. So you can either save that information as back pointers, or you can reconstruct the connections later by searching. >This business with back pointers is needlessly complicated, I found. Hmm...did you try drawing out where they point to? For instance, with the example you gave back in June, if you start with the top-level match and follow the pointers, you get this: http://www.arestlessmind.org/2014/09/20/parse-tree.jpg You can see that you don't care about the contents of the left back-pointer -- it's just reversing through the rule. But they form the structure which puts the right sub-trees in order. Actually, you shouldn't even need to check whether matches are complete: if only the completion operation adds back pointers, then you know that the right side is a complete match and the left side isn't. >But there's another, more annoying ambiguity: > > A -> ?? ??? (i) > A -> ?? ??? (j) > >Sometimes, the same rule can span different lengths. That's typical >of repetition rules. So you have two choices: > >1. Just apply the longest match rule. >2. Dig both rules in parallel, search for the "topmost, first" > difference (whatever that means), and apply prioritised choice > there. Oh, right. Now that you mention it, I *have* seen examples of practical grammars with that sort of ambiguity. I don't know that there's any good solution to that. IIRC it tends to be "real" ambiguity where the user (the person writing the source, not the person writing the grammar) could mean either thing. So I suspect you have to just go with the longest match, unless you want to provide for context-sensitive disambiguation or something... That's my wild-ass guess, anyway. :) --Josh _______________________________________________ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc