Hi Mike! I’m not familiar with the code or the design for the parsing, aside from seeing a mention of Parser Combinators, so this may sound really stupid, but… is it worth looking at something like an Earley parser?
That’s less efficient than LALR (by a constant factor, it's still linear-time on all LR(k) grammars) but, because it’s a chart parser, it's capable of handling ambiguous grammars. It also uses the grammar rules directly, so a) production names are available on the stack for error reporting and b) it’s able to show you where in a production the parse broke down. The basic algorithm is only a few dozen lines of code. Cheers, — Russ > On 5 Jun 2018, at 15:51, Mike Beckerle <[email protected]> wrote: > > > I branched off Josh Adams' daffodil-trailing-sep branch a while back. > Starting at hash > > 9f8173d4a962d9373d29f85d5a92435afe622bba > > > I have since made substantial changes, and just successfully rebased onto the > 2.2.0 development branch. > > > That is, for some definition of "success". I had 44 failures in daffodil-test > before, and after rebase, seems to be same 44 failures. These are all related > to separated sequences in some way. > > > I did this rebase because I'm doing enough refactoring of the grammar stuff > that I didn't want to too far without the changes to the grammar that were > done for the layering feature. > > > Unfortunately, it looks like the whole way sequences and separators are dealt > with has to change. > > > The grammar package of the Daffodil schema compiler has been "barely working" > for awhile now, is overdue for a revamping, and I've tried quite a few > smaller fixes to it without success. > > > I'm at the point now where I'm planning to re-implement much of it. There's > no way the existing grammar stuff was going to be just-tweeked and achieve > all of: > > > * proper trailing-separator suppression behavior > * good diagnostic messaging about searches for delimiters > * performance > * code clarity - apparent correctness > > A naive assumption in the grammar design was the idea that this aspect of the > schema compiler could, in some way, match the data grammar of the Daffodil > specification document. The intention here was that the grammar code would > then be spec-compliant by inspection to some degree. I don't see this as > feasible anymore at all. The grammar in the spec was created without an > implementation to refer back to, and is really pretty poorly structured even > for just a grammar in the specification document, and of course diagnostic > behavior - which I firmly believe is 1/2 the complexity of any parser - was > not a consideration when the DFDL spec grammar was formulated, and the > projection of that into the Daffodil schema compiler's grammar module is > similarly challenged. > > Anyway, that's the status for those trying to follow it. I am going to try to > get this put back together for review before I depart for a vacation on June > 13. > >
