New submission from Łukasz Langa <luk...@langa.pl>:

Python includes a set of batteries that enable parsing of Python code.  This
includes its own AST (provided in the standard library under the `ast` module),
as well as a pure Python tokenizer (provided in the standard library under
`tokenize` and `token`).  It also provides an undocumented CST under lib2to3,
which contains its own outdated and patched copies of `tokenize` and `token`.

This situation causes the following issues for users of Python:
- the built-in AST does not preserve comments or whitespace;
- the built-in AST increasingly modifies the tree before presenting it to user
  code (constant folding moved to the AST in Python 3.7);
- the built-in tokenize.py can only be used to parse Python 3.7+ code;
- the version in lib2to3 is partially customized and partially outdated,
  leaving bits of new grammar not supported; new bits of grammar very often get
  overlooked in lib2to3.
- lib2to3 is not documented.

So if users want to write tools that manipulate Python code, the standard
library doesn't provide them with great options.

I suggest the following plan:

1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py
   (leaving the bits that allow for parsing of Python 3.6 and older files).

2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now
   officially supports tokenizing Python 2.7 - 3.7 code.

3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen.  Document it as the
   built-in CST provided by Python for use in applications which require code
   modification.  Make it still officially support parsing of Python 2.7 - 3.7
   code.

All three changes are made in a backwards-compatible fashion, existing code
should NOT break.  That being said, the parser under Lib/pgen might grow some
new behavior compared to the compatibility mode for lib2to3, I specifically
seek to improve handling of comments and error recovery.

----------
components: Library (Lib)
messages: 315638
nosy: benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, 
serhiy.storchaka
priority: normal
severity: normal
status: open
title: Provide a supported Concrete Syntax Tree implementation in the standard 
library
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33337>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to