On Mon, Apr 6, 2020 at 5:18 AM Jeff Allen <ja...@farowl.co.uk> wrote:

> The PEP gives a good exposition of the problem and proposed solution,
> thanks.
>
> If I understand correctly, the proposal is that the PEG grammar should
> become the definitive grammar for Python at some point, probably for Python
> 3.10, so it may evolve without the LL(1) restrictions. I'd like to raise
> some points with respect to that, which perhaps the migration section could
> answer.
>
Thanks, you definitely have a point here.

> When definitive, the grammar would not then just be for CPython, and would
> also appear as user documentation of the language. Whether that change
> leaves Python with a more useful (readable) grammar seems an important test
> of the idea. I'm looking at
> https://github.com/we-like-parsers/cpython/blob/pegen/Grammar/python.gram
> , and assuming that is indicative of a future definitive grammar. That may
> be incorrect, as it has these issues in my view:
>
> 1. It is decorated with actions in C. If a decorated grammar is offered as
> definitive, one with Python actions (operations on the AST) is preferable,
> as implementation neutral, although still hostage to AST changes that are
> not language changes. Maybe one stripped of actions is best.
>
Yes, the plan is to strip actions and a few other embellishments (types,
names, cuts, and probably also lookaheads -- although the latter may be
significant, we only use them for optimization). The parser generator (
https://github.com/we-like-parsers/cpython/tree/pegen/Tools/peg_generator)
prints a stripped representation (though currently preserving lookaheads --
suppressing those would be a simple change to the code).

> 2. It's quite long, and not at first glance more readable than the LL(1)
> grammar. I had understood ugliness in the LL(1) grammar to result from
> skirting limitations that PEG eliminates. The PEG one is twice as long, but
> recognising about half of it is actions, let's just say that as a grammar
> it's no shorter.
>
Indeed. I believe part of this actually comes from the desire to be 100%
compatible with the old parser (an important constraint is that we don't
want to change the AST since we don't want to change the byte code
generator).

Another part of it comes from expressing in the grammar constraints that
the old parser generator cannot express. For example, the old parser
accepts `1 = x` as an assignment, and it is rejected in a later stage. The
new parser expresses this restriction in the grammar. Note that the full
grammar published in the reference manual (
https://docs.python.org/3.8/reference/grammar.html) doesn't say anything
about this; the grammar used later to describe assignment_stmt does (
https://docs.python.org/3.8/reference/simple_stmts.html#grammar-token-assignment-stmt),
but as a result it is not LL(1) -- those grammar sections sprinkled
throughout the reference manual are all written and updated by hand (and
sometimes we forget!).

> 3. There is some manual guidance by means of &-guards, only necessary (I
> think) as a speed-up or to force out meaningful syntax errors. That would
> be noise to the reader. (This goes away if the PEG parser generator
> generate guards from the first set at a simple "no backtracking" marker.)
>
Yeah, see above. We've thought of generating FIRST sets as a future
enhancement of the generator, and then they can go away. At the moment the
lookaheads we have are all carefully aimed at optimizing the time and space
requirements of the parser.

> 4. In some places, expansive alternatives seem to be motivated by the
> difference between actions, for a start, wherever async pops up. Maybe it
> is also why the definition of lambda is so long. That could go away with
> different support code (e.g. is_async as an argument), but if improvements
> to the support change grammar rules, when the language has not changed,
> that's a danger sign too.
>
Yeah, lambda is complicated by the requirement on the generated AST.
Arguably we have gone too far here (and for 'parameters', which solves
almost the same problem for regular function definitions) and we should put
some of the checks back in the support code. But I note that the old
grammar also has some warts in the area of parameter definitions (though
its lambda is definitely simpler).

> All that I think means that the "operational" grammar from which you build
> the parser is going to be quite unlike the one with which you communicate
> the language. At present ~/Grammar/Grammar both generates the parser (I
> thought) and appears as documentation. I take it to be the ideal that we
> use a single, human-readable definition. For example ANTLR 4 has worked
> hard to facilitate a grammar in which actions are implicit, and the
> generation of an AST from the parse tree/events can be elsewhere. (I'm not
> plugging ANTLR specifically as a solution.)
>
Our cheaper solution is to remove the actions from the display grammar. But
I don't think that Grammar/Grammar should be seen as a complete
specification of the language. And I don't think it is terrible if the
specification says

  function_def_raw:
      | ASYNC 'def' NAME '(' parameters? ')' ['->' annotation] ':' block
      | 'def' NAME '(' parameters? ')' ['->' annotation] ':' block

instead of

  function_def_raw: [ASYNC] 'def' NAME '(' parameters? ')' ['->'
annotation] ':' block

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BLCGRETZSNQORIVWCBYOMNV3KNILDHT5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to