[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2021-08-10 Thread Łukasz Langa

Change by Łukasz Langa :


--
Removed message: https://bugs.python.org/msg374253

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-09-07 Thread Batuhan Taskaya


Change by Batuhan Taskaya :


--
nosy: +BTaskaya

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-08-28 Thread Lysandros Nikolaou


Change by Lysandros Nikolaou :


--
nosy: +lys.nikolaou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-07-25 Thread Edward K Ream


Edward K Ream  added the comment:

Hello all,

This is a "sideways" response to this issue. I have been dithering about 
whether to give you a heads up. I hope you won't mind...

I have just announced the leoAst.py on python-announce-list. You can read the 
announcement here: 

https://github.com/leo-editor/leo-editor/issues/1565#issuecomment-654904747

Imo, leoAst.py solves many of the concerns mentioned in the first comment of 
this thread. leoAst.py is certainly a different approach.

Also imo, the TOG and TOG in leoAst.py plug significant holes in python's ast 
and tokenize modules. These classes might be candidates for python's ast 
module. If you're interested, I will be willing to do further work. If not, I 
completely understand.

As shown in the project's history, a significant amount of invention and 
discovery was required. The root of much of my initial confusion and 
difficulties was the notion that "real programmers don't use tokens". In fact, 
I discovered that the reverse is true. Tokens contain the ground truth. In many 
cases, the parse tree doesn't.

I would be interested in your reactions.

--
nosy: +edreamleo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-03-13 Thread Jimmy Lai

Jimmy Lai  added the comment:

Just found Guido mentioned LibCST. Here is a quick overview:
1. LibCST is an open source Python concrete syntax tree parser. It provides a 
CST looks like and feel like AST.

2. It's built by Instagram for linter and refactoring tools (exact use cases 
what Łukasz mentioned). We have a linter framework built on top of LibCST which 
allows a lint rule automatically fixes the issue (autofixer) for developers. 
We're working on open source it to help developers write better code easily. CC 
Tim
We also found a couple other linter related open source tools use LibCST.

3. It's based on parso (which based on pgen2) and currently supports Python 3.5 
to 3.8. Tim is working on adding the support back to 3.0 now and potentially 
2.7 later.

4. It provides various patterns for traversing and modifying CST easily, 
including the AST visitor/transformer pattern, matchers pattern, various 
helpers for find/replace nodes in a tree and high level transform helpers (e.g. 
added needed import, remove unused import).

5. It also provides metadata for tree node from static analysis, e.g. 
line/column position, qualified name, scope analysis, inferred type annotation 
(through Pyre). Those are useful information for building advanced linter or 
refactoring tool.

There are more features available in LibCST. We continue to develop it to make 
automated refactoring even easier. We welcome your feedback and PRs!

--
nosy: +jimmylai

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-03-12 Thread Guido van Rossum


Guido van Rossum  added the comment:

If people are looking for a concrete CST that works now, maybe LibCST will 
work? https://github.com/Instagram/LibCST

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2019-01-12 Thread kernc


Change by kernc :


--
nosy: +kernc

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-12-03 Thread Niklas Rosenstein


Niklas Rosenstein  added the comment:

Lukasz, have you created a 3rd party package branching off lib2to3? I'm working 
on a project that is based on it (in a similar usecase as YAPF and Black) and 
was hoping that there may be some version maintained distinctly from the Python 
release schedule.

--
nosy: +n_rosenstein

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-05-03 Thread Ivan Levkivskyi

Change by Ivan Levkivskyi :


--
nosy: +levkivskyi

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-27 Thread Jakub Wilk

Change by Jakub Wilk :


--
nosy: +jwilk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-25 Thread Guido van Rossum

Guido van Rossum  added the comment:

I think merging the tokenizers still makes sense. We can then document
top-level tokenize.py (in 3.8 and later) as guaranteed to be able to
tokenize anything going back to Python 2.7. And since lib2to3/pgen2 it is
undocumented I presume removing lib2to3/pgen2/tokenize.py isn't going to
break anything -- but if we worry about that it could be made into a
trivial wrapper for top-level tokenize.py.

Still, the improvements you're planning to lib2to3 (no matter how
compatible) will benefit more people sooner if you extract it into its own
PyPI package. Not everybody can upgrade to 3.7 as soon as Instagram. :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-25 Thread Łukasz Langa

Łukasz Langa  added the comment:

[njs]
> "there's a bug in handling this python 2 code, so black won't be able to 
> reformat it until the next major python release"

Nah, we're still allowed to fix bugs in micro releases.  We should have more of 
those instead of sitting on fixed bugs for months.  That's a discussion for a 
different venue though.


[gutworth]
> The stdlib is a bad place for anything that needs to evolve at a non-glacial 
> place.

The syntax tree only needs to evolve to keep up with current Python 
development.  That's why I think it makes sense to tie the two.


[gvr]
> please consider seriously to move to a 3rd party package

Does that also invalidate the idea to merge the tokenizers?

And if so, does that also invalidate the idea to update lib2to3's tokenizer 
(BPO-8)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-25 Thread Ethan Smith

Change by Ethan Smith :


--
nosy: +Ethan Smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-24 Thread Zsolt Dollenstein

Change by Zsolt Dollenstein :


--
nosy: +zsol

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-24 Thread Guido van Rossum

Guido van Rossum  added the comment:

Lukasz, pleased consider seriously to move to a 3rd party package. Even
pgen2.

On Mon, Apr 23, 2018, 21:12 Benjamin Peterson 
wrote:

>
> Benjamin Peterson  added the comment:
>
> The stdlib is a bad place for anything that needs to evolve at a
> non-glacial place. For example, even when 2to3 had not yet fallen out of
> favor, there were effectively 3 versions of it: one 2.7 and two in
> maintained 3.x branches. That was a large pain. 2to3 also could only be
> updated as quickly as Python is released.
>
> --
> stage: patch review ->
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Benjamin Peterson

Benjamin Peterson  added the comment:

The stdlib is a bad place for anything that needs to evolve at a non-glacial 
place. For example, even when 2to3 had not yet fallen out of favor, there were 
effectively 3 versions of it: one 2.7 and two in maintained 3.x branches. That 
was a large pain. 2to3 also could only be updated as quickly as Python is 
released.

--
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Łukasz Langa

Change by Łukasz Langa :


--
keywords: +patch
pull_requests: +6285
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

It does seem like it'd be unfortunate to end up in a situation like "sorry, 
there's a bug in handling this python 2 code, so black won't be able to 
reformat it until the next major python release". And I assume this issue is 
motivated by running into limitations of the current version; waiting for 3.8 
before you can fix those seems unfortunate too?

Another option to think about: make the library something that's maintained by 
python-dev, but released separately on PyPI.

--
nosy: +njs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Łukasz Langa

Łukasz Langa  added the comment:

> But lib2to3 is proof that the stdlib is just as much subject to stalling.

The issue here is internal visibility. "lib2to3" is a library that supports 
"2to3" which is rather neglected internally since we started promoting `six` as 
a better migration strategy to Python 3.

Most core devs don't even *know* new syntax is supposed to be added to lib2to3. 
 Case in point: somehow Lib/tokenize.py was updated just in time for f-strings 
to be released but not Lib/lib2to3/pgen2/tokenize.py.

By unifying the tokenizers and moving the CST out of lib2to3's guts (and 
documenting it as a supported feature!), I'm pretty sure we can eliminate the 
danger of forgetting to update it in the future.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Guido van Rossum

Guido van Rossum  added the comment:

But lib2to3 is proof that the stdlib is just as much subject to stalling.
Maybe lib2to3 and pgen2 would have a livelier future if they weren't
limited to updates in sync with Python releases.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Łukasz Langa

Łukasz Langa  added the comment:

> The important part is that it is maintained and kept up to date with future 
> language grammar changes while maintaining "backwards grammar compatibility".

Yes, which is why I have trouble believing this can be effectively outsourced.  
Existing third-party libraries always stalled at some point in this regard.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Gregory P. Smith

Gregory P. Smith  added the comment:

+1 in general to this work. Łukasz is effectively preaching to the choir by 
looping me in here. :)

It is a big challenge to practically support Python in that we have no good 
ability to parse and understand all language syntax versions via a single API 
that does not depend on the version of the language your own tools process is 
running under.

lib2to3.pgen2 is the closest thing we've got and it used by a notable crop of 
python refactoring tools today because there really wasn't another available 
choice.  All they know is that they've got a ".py" file, they can't know which 
specific language versions it may be intended for.  Nor should they ever need 
to run _on_ that language version.  That situation is a nightmare (ex: pylint 
uses ast and must run on the version of the language it is to analyze as)

I'd love to see a ponycorn module that everything could use to run on top of 
Python 3.recent yet be able to meaningfully process 2.7 and 3.4-3.7 code.  This 
is an area where the language versions we support parsing and analyzing should 
_not_ be limited to the current CPython org still supported releases.

Does this need to go in the CPython project and integrate with its internals 
such as pgen.c or pgen2?  I don't know.  From my perspective this could be a 
PyPI project.  Even if it seems odd that we have stdlib ast and lib2to3.pgen2 
modules and pgen internal to CPython; at some point those could be seen as 
implementation details and made private in favor of tool application code using 
a canonical ponycorn thing on PyPI.  The important part is that it is 
maintained and kept up to date with future language grammar changes while 
maintaining "backwards grammar compatibility".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Łukasz Langa

Łukasz Langa  added the comment:

> These modification are applied only before bytecodecode generation. The AST 
> presented to user is not modified.

This bit me when implementing PEP 563 but I was then on the compile path, 
right.  Still, the latest docstring folding would qualify as an example here, 
too, no?


> Is this a problem? 2.7 is a dead end, its support will be ended in less than 
> 2 years. Even 3.6 will be moved to a security only fixes stage short time 
> after releasing 3.8.

Yes, it is a problem.  We will support Python 2 until 2020 but people will be 
running Python 2 code for a decade *at least*.  We need to provide those people 
a way to move their code forward.  Static analysis tools like formatters, 
linters, type checkers, or 2to3-style translators, are all soon going to run on 
Python 3.  It would be a shame if those programs were barred from helping users 
that are still struggling on Python 2.

A closer example is async/await.  It would be a shame if running on Python 3.7 
meant you can't write a tool that renames (or even just *detects*) invalid uses 
of async/await.  I firmly believe that the version of the runtime should be 
indepedent of the version it's able to analyze.


> I'm in favor of updating Lib/lib2to3/pgen2/tokenize.py, but I don't 
> understand why Lib/tokenize.py should parse 2.7.

Hopefully I sufficiently explained that above.


> I'm in favor of reimplementing pgen in Python if this will simplify the code 
> and the building process. Python code is simpler than C code, this code is 
> not performance critical, and in any case we need an external Python when 
> modify grammar of bytecode.

Well, I didn't think about abandoning pgen.  I admit that's mostly because my 
knee-jerk reaction was that it would be too slow.  But you're right that this 
is not performance critical because every `pip install` runs `compileall`.

I guess we could parse in "strict" mode for Python itself but allow for 
multiple grammars for standard library use (as I explained in the reply to 
Guido).  And this would most likely give us opportunity to iterate on grammar 
improvements in the future.

And yet, I'm cautious here.  Even ignoring performance, that sounds like a more 
ambitious task from what I'm attempting.  Unless I find partners in crime for 
this, I wouldn't attempt that.  And I would need thumbs up from the BDFL and 
performance-wary contributors.


> For what purposes the CST is needed besides 2to3?

Anywhere where you need the full view of the code which includes non-semantic 
pieces.  Those include:
- whitespace;
- comments;
- parentheses;
- commas;
- strings prefixes.

The main use case is linters and refactoring tools.  For example mypy is using 
a modified AST to support type comments.  YAPF and Black are based on lib2to3 
because as formatters they can't lose comments, string prefixes, and 
organizational parentheses either.  JEDI is using Parso, a lib2to3 fork, for 
similar reasons.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Łukasz Langa

Łukasz Langa  added the comment:

> I'm in favor of unifying the tokenizers and of updating and moving pgen2 
> (though I don't have time to do the work).

I'm willing to do all the work as long as I have somebody to review it. Case in 
point: BPO-8.



> Also I think you may have to make a distinction between the parser generator 
> and its data structures, and between the generated parser for Python vs. the 
> parser for other LL(1) grammars one might feed into it.

Technically pgen2 has the ability to parse any LL(1) grammar but so far the 
plumbing is tightly tied to the tokenizer.  We'd need to enable plugging that 
in, too.



> And I don't think you're proposing to replace Parser/pgen.c with Lib/pgen/, 
> right?

No, I'm not.



> Nor to replace the CST actually used by CPython's parser with the data 
> structures used by pgen2's driver.

No, I'm not.



> So the relationship between the CST you propose to document and CPython 
> internals wouldn't be quite the same as that between the AST used by CPython 
> and the ast module (since those *do* actually use the same code).

Right.  Once we unify the standard library tokenizers (note: *not* tokenizer.c 
which will stay), there wouldn't be much extra documentation to write for 
Lib/tokenize.py.  For Lib/pgen/ itself, we'd need to provide both an API 
rundown and an intro to the high-level functionality (how to create trees from 
files, string, etc.; how to visit trees and edit them; and so on).


> I'm not sure if it's technically possible to give tokenize.py the ability to 
> tokenize Python 2.7 and up without some version-selection flag -- have you 
> researched this part yet?

There's two schools. This is going to take a while to explain :)

One school is to force the caller to declare what Python version they want to 
parse.  This is algorithmically cleaner because we can then literally take 
Grammar/Grammar from various versions of Python and have the user worry about 
picking the right one.

The other school is what lib2to3 does currently, which is to try to implement 
as much of a superset of Python versions as possible.  This is way easier to 
use because the grammar is very forgiving.  However, this has limitations.  
There are three major incompatibilities that we need to deal with, with raising 
degree of severity:
- async/await;
- print statements;
- exec statements.

Async and await became proper keywords in 3.7 and thus broke usage of those as 
names.  It's relatively easy to work around this one seamlessly by keeping the 
grammar trickery we've had in place for 3.5 and 3.6.  This is what lib2to3 does 
today already

The print statement is fundamentally incompatible with the print function.  
lib2to3 has two grammar variants and most users by default choose the one 
without the print statement.  Why?  Because it cannot be reliably sniffed 
anymore.  Python 3-only code will not use the __future__ import.  In fact, 2to3 
also doesn't do auto-detection, relies on the user running `2to3 -p` to 
indicate they mean the grammar with the print function.

The exec statement is even worse because there isn't even a __future__ import.  
It's annoying because it creates a third combination. 

So now the driver has to attempt three grammars (in this order):
- the almost compatible combined Python 2 + Python 3 one (that assumes exec is 
a function and print is a function);
- the one that assumes exec is a *statement* but print is still a function 
(because __future__ import);
- the one that exposes the legacy exec and print statements.

This approach has one annoying wart.  Imagine you have a file like this:

  print('msg', file=sys.stderr)
  if

Now the driver will attempt all three grammars and fail, and will report that 
the parse error is on the print line.  This can be overcome by comparing syntax 
errors from each grammar and showing the one on the furthest line (which is the 
most likely to be the real culprit).  But it's still annoying and will 
sometimes not do what the user wanted.


-- OK, OK. So which to choose?

And now, while this sounds like more work and is harder to get right, I still 
think the combined grammar with minimal incompatibilities is the better 
approach.  Why?  Two reasons.

1. Nobody ever knows what Python version *exactly* a given file is.  Most files 
aren't even considering compatibility that fine-grained.  And having to attempt 
to parse not three but potentially 8 grammars (3.7 - 3.2, 2.7, 2.6) would be 
prohibitively slow.

2. My tool maybe wants to actually *modify* the compatibility level by, say, 
rewriting ''.format() with f-strings or putting trailing commas where old 
Pythons didn't accept them.  So it would be awkward if the grammar I used to 
read the file wasn't compatible with my later changes.

Unless I'm swayed otherwise, I'd continue on what lib2to3 did, with the 
exception that we need to add a grammar variant without the `exec` statement, 
and the driver needs to attempt 

[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> - the built-in AST increasingly modifies the tree before presenting it to user
>   code (constant folding moved to the AST in Python 3.7);

These modification are applied only before bytecodecode generation. The AST 
presented to user is not modified.

> - the built-in tokenize.py can only be used to parse Python 3.7+ code;

Is this a problem? 2.7 is a dead Lib/lib2to3/pgen2/tokenize.pyend, its support 
will be ended in less than 2 years. Even 3.6 will be moved to a security only 
fixes stage short time after releasing 3.8.

I'm in favor of updating Lib/lib2to3/pgen2/tokenize.py, but I don't understand 
why Lib/tokenize.py should parse 2.7.

I'm in favor of reimplementing pgen in Python if this will simplify the code 
and the building process. Python code is simpler than C code, this code is not 
performance critical, and in any case we need an external Python when modify 
grammar of bytecode.

See also issue30455 where I try to get rid of duplications by generating all 
tokens-related data and code from a single source (token.py or external text 
file).

For what purposes the CST is needed besides 2to3? I know only that it could 
help to determine the correct position in docstrings in doctests and similar 
tools which need to process docstrings and report errors. This is not possible 
with AST due to inlined '\n', escaped newlines, and string literals 
concatenation. Changes in 3.7 made this even worse (see issue32911).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Guido van Rossum

Guido van Rossum  added the comment:

I'm glad you've rediscovered pgen2!

I'm in favor of unifying the tokenizers and of updating and moving pgen2 
(though I don't have time to do the work).

I'm not sure if it's technically possible to give tokenize.py the ability to 
tokenize Python 2.7 and up without some version-selection flag -- have you 
researched this part yet?

Also I think you may have to make a distinction between the parser generator 
and its data structures, and between the generated parser for Python vs. the 
parser for other LL(1) grammars one might feed into it.

And I don't think you're proposing to replace Parser/pgen.c with Lib/pgen/, 
right? Nor to replace the CST actually used by CPython's parser with the data 
structures used by pgen2's driver. So the relationship between the CST you 
propose to document and CPython internals wouldn't be quite the same as that 
between the AST used by CPython and the ast module (since those *do* actually 
use the same code).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Łukasz Langa

Change by Łukasz Langa :


--
pull_requests:  -6270

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Łukasz Langa

Change by Łukasz Langa :


--
keywords:  -patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Łukasz Langa

Łukasz Langa  added the comment:

See BPO-8 for an implementation of Step 1.

--
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Łukasz Langa

Change by Łukasz Langa :


--
keywords: +patch
pull_requests: +6270
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2018-04-22 Thread Łukasz Langa

New submission from Łukasz Langa :

Python includes a set of batteries that enable parsing of Python code.  This
includes its own AST (provided in the standard library under the `ast` module),
as well as a pure Python tokenizer (provided in the standard library under
`tokenize` and `token`).  It also provides an undocumented CST under lib2to3,
which contains its own outdated and patched copies of `tokenize` and `token`.

This situation causes the following issues for users of Python:
- the built-in AST does not preserve comments or whitespace;
- the built-in AST increasingly modifies the tree before presenting it to user
  code (constant folding moved to the AST in Python 3.7);
- the built-in tokenize.py can only be used to parse Python 3.7+ code;
- the version in lib2to3 is partially customized and partially outdated,
  leaving bits of new grammar not supported; new bits of grammar very often get
  overlooked in lib2to3.
- lib2to3 is not documented.

So if users want to write tools that manipulate Python code, the standard
library doesn't provide them with great options.

I suggest the following plan:

1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py
   (leaving the bits that allow for parsing of Python 3.6 and older files).

2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now
   officially supports tokenizing Python 2.7 - 3.7 code.

3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen.  Document it as the
   built-in CST provided by Python for use in applications which require code
   modification.  Make it still officially support parsing of Python 2.7 - 3.7
   code.

All three changes are made in a backwards-compatible fashion, existing code
should NOT break.  That being said, the parser under Lib/pgen might grow some
new behavior compared to the compatibility mode for lib2to3, I specifically
seek to improve handling of comments and error recovery.

--
components: Library (Lib)
messages: 315638
nosy: benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, 
serhiy.storchaka
priority: normal
severity: normal
status: open
title: Provide a supported Concrete Syntax Tree implementation in the standard 
library
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com