Re: Parsing Timeline Suggestions

Jeffrey Kegler Tue, 19 Feb 2019 16:54:00 -0800

Thanks for your kind words.  In this, I'll just respond to the PEG points.

That part of my timeline has caused more blowback than any other part, not
unexpectedly.  The quips are aimed, not at people like yourself who, while
PEG optimists, know its difficulties and do not mislead others.
Unfortunately, at the time I wrote that version of the timeline (and still
I suspect) you are the minority of pro-PEG voices out on the Web.


Many presentations of PEG present it as a "throw anything at me" solution.
These less careful voices were the loudest and cause a lot of folks to
waste time pushing a rope.  My quips were aimed to catch the attention of
these folks and they did their job.

I've done this before, with yacc, for the same reason, and caught the same
kind of heat.  People routinely were recommending yacc for "serious
parsing" either in ignorance or reckless disregard of the actual experience
with it.  So I did a "bovicide" series of blog posts.  Nowadays I don't
diss yacc, because it's no longer the danger to parsing newbies that it was.

If you read my entry carefully, and get beyond the quips (as I am sure you
did), you see that I *do* say there are safe uses of PEG, and in the
footnote concede that my description in the text is "negative".  I go on to
point the reader toward the pro-PEG literature -- a literature of which the
less careful PEG advocates are usually unaware.

So to my careless reader, a warning about PEG gets across, and to my
careful reader, I point out that, while I am more optimistic about
Earley/Leo, there is safe use of PEG and a literature worth consulting.

In fact, I think there are 5 "forever" algorithms, parsing algorihms that
are of permanent significance, and PEG is one of them.  (The other 4 are
regular expressions, recursive descent, Earley/Leo and Sakai's algorihm,
aka CYK).

Getting to the merits, I've heard before that some find the PEG syntax more
intuitive than BNF.  Everyone is the ultimate authority on their own
intuitions, so I won't dispute this.  And you are more optimistic about
research into PEG than I am, clearly.  But as Yogi Berra says, "predictions
are difficult, especially about the future".  My timeline does not engage
in predictions, although it certainly reflects my optimism wrt Earley/Leo.
I do claim to be entitled to greater confidence in my predictions on the
grounds, at 65, I am much less likely to be around if they fall flat. :-)

I'll tackle the points about why the timeline includes/excludes what it
does separately.

On Tue, Feb 19, 2019 at 5:47 AM <nors...@gmail.com> wrote:

> (Following up from
> https://twitter.com/jeffreykegler/status/1097628806091816960)
>
> Dear Jeffrey,
>
> I was delighted when I found about your parsing timeline a couple of
> months ago.
> I'm a PhD student researching parsing myself and it's great to see
> resources
> like this bringing awareness to what happened in the field, and it's a
> great
> reference for people in the field as well. It also got me more interested
> in
> Earley and your own work of it, which is quite cool.
>
> That being said, I do have some insatisfactions about the timeline,
> especially
> the end of it. It's always easy to criticize, so I would like to offer my
> help
> instead, to see if I can contribute something useful to the timeline.
>
> According to me, there are two points that could use improvement. First, I
> feel
> top-down recursive-descent formalizations, especially PEG, are being
> unfairly
> dismissed. Second, it seems to me like there are quite a few new
> developments
> that would deserve inclusion at the end of the timeline (especially by
> comparison to the level of details towards the start of the timeline).
>
> I don't pretend to have the objective truth, but the goal is to have a
> productive conversation that can hopefully lead to improved accuracy.
>
> ---
>
> Regarding PEG, I find the whole description minus the first paragraph to be
> rather contestable and (I feel) uncharitable.
>
> It's entirely correct that PEG does not cover BNF. But BNF is essentially a
> notation for the CFG formalism, which PEG is not.
>
> It seems your main beef with PEG is the problem that has been described as
> "language hiding" or "prefix capture": the fact that a rule like `A ::= a
> | aa`
> will not match the input "aa".
>
> This is indeed an important problem of PEGs. But the way I see it, it's a
> dual
> to the problem of ambiguity in CFGs. Both preclude one another: PEGs are
> unambiguous but have prefix capture, CFGs do not have prefix capture but
> are
> ambiguous. The detection of both issue is provably intractable
> (statically) and
> may be more related than we know, as a relatively similar machinery can be
> put
> to work to do partial detection of both ambiguity in CFGs and prefix
> capture in
> PEGs, cf. "Modular Syntax Demands Verification" by Sylvain Schmitz (*).
>
> (*) http://www.i3s.unice.fr/~mh/RR/2006/RR-06.32-S.SCHMITZ.pdf
>
> I have written a fair deal of both PEGs and CFGs and both seem about
> equally
> difficult to write. Non-general CFG formalism (LALR, LL) are much harder,
> and so
> are PEG tools that have no support to build left-associative parse trees.
>
> I also would never advise anyone not to test a grammar, whatever the
> formalism.
> I've certainly never written a big chunk of grammar without making a
> mistake
> whatever the formalism. And in practice, the overwhelming majority of these
> mistakes weren't about either prefix capture nor ambiguity.
>
> If your practical experience differs, I'd be interested to hear about it.
>
> ---
>
> Regarding new developments, here are a couple of ideas of the top of my
> head.
> The first two items of the list are not random, since those were things I
> worked
> on in my research (http://norswap.com/publications/)
>
> - Left-recursion and left-associativity handling in PEG, since it fixes the
>   foremost problem (in my opinion) with PEGs
>
> - Context-sensitive parsing
>
>   You already already mention monadic parsing (but do not emphasize its
>   context-sensitive properties). Another oldie that deserves to be
> mentionned
>   are Prolog-style definite clause grammars (DCGs).
>
>   There are some other works worth mentioning if this is a topic of
> interest for
>   you.
>
>   My own work in the area is essentially pushing an idea which expressed
> in a
>   lovely manner:
>
>   > Recursive descent does have a huge advantage, one which, despite its
> severe
>   > limitations, will save it from obsolescence time and again.
> Hand-written
>   > recursive descent is essentially calling subroutines. Adding custom
>   > modification to recursive descent is very straight-forward.
>
>   But making it so that these subroutines may manipulate state in a way
> that is
>   safe in the presence of backtracking, enabling context-sensitive parsing.
>
> - Packrat parsing, if only because it enables O(1) parsing of PEG, a class
> of
>   languages that may still strictly include the whole of CFG — as far as I
> know,
>   we do not know of a simple language that can be expressed in CFG but not
> in
>   PEG (we do know language expressible in PEG but not in CFGs).
>
>   Note that, even if PEG does indeed include CFG, that does not imply the
>   existance of an algorithm to convert from one to the other.
>
> - Tackling the problem of ambiguity formally,
>   "Disambiguating Grammars with Tree Automata"
>   https://michaeldadams.org/papers/disambiguating-grammars/
>
> - GLL parsing probable deserves a mention for mention for taking a LL-style
>   machinery to the point of finally parsing fully general CFGs through the
> use
>   of Tomita-style graph structured stacks
>   http://www.cs.rhul.ac.uk/research/languages/csle/GLLparsers.html
>
> - Similarly, and while I'm not a fan of the way this work is marketed,
> parsing
>   using the Brzozowski derivative is certainly an interesting direction in
>   parsing: http://matt.might.net/articles/parsing-with-derivatives/
>
> - This might be outside the scope of your timeline, but there is also a
> whole
>   lot of work (some quite old) on "semi-parsing", i.e. fuzzy or partial
> parsing:
>   http://grammarware.net/text/2014/semiparsing.pdf
>
> ---
>
> There you have it. Let me know what you think, and I'm looking forward to
> talking with you! I hope I can help the project in any capacity that I'm
> able.
>
> Cheers,
>
> Nicolas LAURENT
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marpa-parser+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to marpa-parser+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Parsing Timeline Suggestions

Reply via email to