The PEP gives a good exposition of the problem and proposed solution,
thanks.
If I understand correctly, the proposal is that the PEG grammar should
become the definitive grammar for Python at some point, probably for
Python 3.10, so it may evolve without the LL(1) restrictions. I'd like
to raise some points with respect to that, which perhaps the migration
section could answer.
When definitive, the grammar would not then just be for CPython, and
would also appear as user documentation of the language. Whether that
change leaves Python with a more useful (readable) grammar seems an
important test of the idea. I'm looking at
https://github.com/we-like-parsers/cpython/blob/pegen/Grammar/python.gram
, and assuming that is indicative of a future definitive grammar. That
may be incorrect, as it has these issues in my view:
1. It is decorated with actions in C. If a decorated grammar is offered
as definitive, one with Python actions (operations on the AST) is
preferable, as implementation neutral, although still hostage to AST
changes that are not language changes. Maybe one stripped of actions is
best.
2. It's quite long, and not at first glance more readable than the LL(1)
grammar. I had understood ugliness in the LL(1) grammar to result from
skirting limitations that PEG eliminates. The PEG one is twice as long,
but recognising about half of it is actions, let's just say that as a
grammar it's no shorter.
3. There is some manual guidance by means of &-guards, only necessary (I
think) as a speed-up or to force out meaningful syntax errors. That
would be noise to the reader. (This goes away if the PEG parser
generator generate guards from the first set at a simple "no
backtracking" marker.)
4. In some places, expansive alternatives seem to be motivated by the
difference between actions, for a start, wherever async pops up. Maybe
it is also why the definition of lambda is so long. That could go away
with different support code (e.g. is_async as an argument), but if
improvements to the support change grammar rules, when the language has
not changed, that's a danger sign too.
All that I think means that the "operational" grammar from which you
build the parser is going to be quite unlike the one with which you
communicate the language. At present ~/Grammar/Grammar both generates
the parser (I thought) and appears as documentation. I take it to be the
ideal that we use a single, human-readable definition. For example ANTLR
4 has worked hard to facilitate a grammar in which actions are implicit,
and the generation of an AST from the parse tree/events can be
elsewhere. (I'm not plugging ANTLR specifically as a solution.)
Jeff Allen
On 02/04/2020 19:10, Guido van Rossum wrote:
Since last fall's core sprint in London, Pablo Galindo Salgado,
Lysandros Nikolaou and myself have been working on a new parser for
CPython. We are now far enough along that we present a PEP we've written:
https://www.python.org/dev/peps/pep-0617/
Hopefully the PEP speaks for itself. We are hoping for a speedy
resolution so we can land the code we've written before 3.9 beta 1.
If people insist I can post a copy of the entire PEP here on the list,
but since a lot of it is just background information on the old LL(1)
and the new PEG parsing algorithms, I figure I'd spare everyone the
need of reading through that. Below is a copy of the most relevant
section from the PEP. I'd also like to point out the section on
performance (which you can find through the above link) -- basically
performance is on a par with that of the old parser.
==============
Migration plan
==============
This section describes the migration plan when porting to the new
PEG-based parser
if this PEP is accepted. The migration will be executed in a series of
steps that allow
initially to fallback to the previous parser if needed:
1. Before Python 3.9 beta 1, include the new PEG-based parser
machinery in CPython
with a command-line flag and environment variable that allows
switching between
the new and the old parsers together with explicit APIs that allow
invoking the
new and the old parsers independently. At this step, all Python
APIs like ``ast.parse``
and ``compile`` will use the parser set by the flags or the
environment variable and
the default parser will be the current parser.
2. After Python 3.9 Beta 1 the default parser will be the new parser.
3. Between Python 3.9 and Python 3.10, the old parser and related
code (like the
"parser" module) will be kept until a new Python release happens
(Python 3.10). In
the meanwhile and until the old parser is removed, **no new Python
Grammar
addition will be added that requires the peg parser**. This means
that the grammar
will be kept LL(1) until the old parser is removed.
4. In Python 3.10, remove the old parser, the command-line flag, the
environment
variable and the "parser" module and related code.
--
--Guido van Rossum (python.org/~guido <http://python.org/~guido>)
/Pronouns: he/him //(why is my pronoun here?)/
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list --python-dev@python.org
To unsubscribe send an email topython-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived
athttps://mail.python.org/archives/list/python-dev@python.org/message/HOZ2RI3FXUEMAT4XAX4UHFN4PKG5J5GR/
Code of Conduct:http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/HK4NI22ANZBVLSPL4NCEJGPZPFBOFNVD/
Code of Conduct: http://python.org/psf/codeofconduct/