New submission from Łukasz Langa <luk...@langa.pl>: Python includes a set of batteries that enable parsing of Python code. This includes its own AST (provided in the standard library under the `ast` module), as well as a pure Python tokenizer (provided in the standard library under `tokenize` and `token`). It also provides an undocumented CST under lib2to3, which contains its own outdated and patched copies of `tokenize` and `token`.
This situation causes the following issues for users of Python: - the built-in AST does not preserve comments or whitespace; - the built-in AST increasingly modifies the tree before presenting it to user code (constant folding moved to the AST in Python 3.7); - the built-in tokenize.py can only be used to parse Python 3.7+ code; - the version in lib2to3 is partially customized and partially outdated, leaving bits of new grammar not supported; new bits of grammar very often get overlooked in lib2to3. - lib2to3 is not documented. So if users want to write tools that manipulate Python code, the standard library doesn't provide them with great options. I suggest the following plan: 1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py (leaving the bits that allow for parsing of Python 3.6 and older files). 2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now officially supports tokenizing Python 2.7 - 3.7 code. 3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen. Document it as the built-in CST provided by Python for use in applications which require code modification. Make it still officially support parsing of Python 2.7 - 3.7 code. All three changes are made in a backwards-compatible fashion, existing code should NOT break. That being said, the parser under Lib/pgen might grow some new behavior compared to the compatibility mode for lib2to3, I specifically seek to improve handling of comments and error recovery. ---------- components: Library (Lib) messages: 315638 nosy: benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, serhiy.storchaka priority: normal severity: normal status: open title: Provide a supported Concrete Syntax Tree implementation in the standard library versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33337> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com