[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: Problems that you are going to find: * The c tokenizer throws syntax errors while the tokenizer module does not. For example: ❯ python -c "1_" File "", line 1 1_ ^ SyntaxError: invalid decimal literal ❯ python -m tokenize <<< "1_" 1,0-1,1:

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I have built a draft of how the changes required to make what you describe, in case you want to finish them: https://github.com/pablogsal/cpython/tree/tokenizer_mod -- ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > It might also make sense to build new tokenize.py apis avoiding the > `readline()` api -- I always found it painful to work with Then we would need to maintain the old Python APIs + the new ones using the module? What you are proposing seems more

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Anthony Sottile
Anthony Sottile added the comment: I haven't looked into or thought about that yet, it might not be possible It might also make sense to build new tokenize.py apis avoiding the `readline()` api -- I always found it painful to work with -- ___

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > Either works for me, would you be able to point me to the starting bits as to > how `_ast` becomes builtin? https://github.com/python/cpython/blob/master/Python/Python-ast.c#L10075-L10079 and

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Anthony Sottile
Anthony Sottile added the comment: Either works for me, would you be able to point me to the starting bits as to how `_ast` becomes builtin? -- ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > private api sounds fine too -- I thought it was necessary to implement the > module (as it needs external linkage) but if it isn't then even better We can make it builtin the same way we do for the _ast module, or we can have a new module under

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Anthony Sottile
Anthony Sottile added the comment: private api sounds fine too -- I thought it was necessary to implement the module (as it needs external linkage) but if it isn't then even better -- ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: For reimplementing Lib/tokenize.py we don't need to publicly expose anything in the C-API. We can have a private _tokenize module with uses whatever you need and then you use that _tokenize module in the tokenize.py file to reimplement the exact

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Anthony Sottile
Anthony Sottile added the comment: you already have that right now because the `tokenize` module is exposed. (except that every change to the tokenization requires it to be implemented once in C and once in python) it's much more frustrating when the two differ as well I don't think all

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > I assumed, but I don't feel confortable exposing the built-in one. As an example of the situation, I want to avoid: every time we change anything in the AST because of internal details we have many complains and pressure from tool authors because

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > I'm interested in it because the `tokenize` module is painfully slow I assumed, but I don't feel confortable exposing the built-in one. -- ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Anthony Sottile
Anthony Sottile added the comment: I'm interested in it because the `tokenize` module is painfully slow -- ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2021-01-27 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I am -1 exposing the C-API of the tokenizer. For the new parser several modifications of the C tokenizer had to be done and some of them modify existing behaviour slightly. I don't want to corner ourselves in a place where we cannot make improvements

[issue3353] make built-in tokenizer available via Python C API

2021-01-26 Thread Anthony Sottile
Anthony Sottile added the comment: Serhiy Storchaka is this still blocked? it's been a few years on either this or the linked issue and I'm reaching for this one :) -- nosy: +Anthony Sottile ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Jim Fasarakis-Hilliard
Jim Fasarakis-Hilliard added the comment: That makes sense to me, I'll wait around until the dependency is resolved. -- ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I am working on the other issue (the recent patch is still not published). Sorry, but two issues modify the same code and are conflicting. Since I believe that this issue makes less semantic changes, I think it would be easier to rebase it after finishing

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Jim Fasarakis-Hilliard
Jim Fasarakis-Hilliard added the comment: Thanks for linking the dependency, Serhiy :-) Is there anybody currently working on the other issue? Also, shouldn't both issues now get retagged to Python 3.7? -- ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Please hold this until finishing issue25643. -- nosy: +serhiy.storchaka ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- dependencies: +Python tokenizer rewriting ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2017-03-14 Thread Jim Fasarakis-Hilliard
Jim Fasarakis-Hilliard added the comment: Okay, I'll take a look at it over the next days and try and submit a PR after fixing any issues that might be present. -- ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2017-03-13 Thread Dustin J. Mitchell
Dustin J. Mitchell added the comment: If the patch still applies cleanly, I have no issues with you or anyone opening a PR. I picked this up several years ago at the PyCon sprints, and don't remember a thing about it, nor have I touched any other bit of the CPython source since then. So any

[issue3353] make built-in tokenizer available via Python C API

2017-03-13 Thread Jim Fasarakis-Hilliard
Jim Fasarakis-Hilliard added the comment: Could you submit a PR for this? I haven't seen any objections to this change, a PR will expose this to more people and a clear decision on whether this change is warranted can be finally made (I hope). -- nosy: +Jim Fasarakis-Hilliard

[issue3353] make built-in tokenizer available via Python C API

2015-11-14 Thread Berker Peksag
Changes by Berker Peksag : -- nosy: +berker.peksag versions: +Python 3.6 -Python 3.5 ___ Python tracker ___

[issue3353] make built-in tokenizer available via Python C API

2015-11-05 Thread Rose Ames
Changes by Rose Ames : -- nosy: +superluser ___ Python tracker ___ ___ Python-bugs-list

[issue3353] make built-in tokenizer available via Python C API

2015-06-29 Thread Dustin J. Mitchell
Dustin J. Mitchell added the comment: This seems to have stalled out after the PyCon sprints. Any chance the final patch can be reviewed? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353

[issue3353] make built-in tokenizer available via Python C API

2015-04-14 Thread Dustin J. Mitchell
Dustin J. Mitchell added the comment: From my read of this bug, there are two distinct tasks mentioned: 1. make PyTokenizer_* part of the Python-level API 2. re-implement 'tokenize' in terms of that Python-level API #1 is largely complete in Andrew's latest patch, but that will likely need:

[issue3353] make built-in tokenizer available via Python C API

2015-04-14 Thread Dustin J. Mitchell
Dustin J. Mitchell added the comment: New: - rename token symbols in token.h with a PYTOK_ prefix - include an example of using the PyTokenizer functions - address minor review comments -- Added file: http://bugs.python.org/file38999/issue3353-2.patch

[issue3353] make built-in tokenizer available via Python C API

2015-04-14 Thread Dustin J. Mitchell
Dustin J. Mitchell added the comment: Here's an updated patch for #1: Existing Patch: - move tokenizer.h from Parser/ to Include/ - Add PyAPI_Func to export tokenizer functions New: - Removed unused, undefined PyTokenizer_RestoreEncoding - Include PyTokenizer_State with limited ABI

[issue3353] make built-in tokenizer available via Python C API

2015-04-14 Thread Ned Deily
Changes by Ned Deily n...@acm.org: -- stage: test needed - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353 ___ ___

[issue3353] make built-in tokenizer available via Python C API

2014-06-22 Thread Andrew C
Andrew C added the comment: The previously posted patch has become outdated due to signature changes staring with revision 89f4293 on Nov 12, 2009. Attached is an updated patch. Can it also be confirmed what are the outstanding items for this patch to be applied? Based on the previous logs

[issue3353] make built-in tokenizer available via Python C API

2014-06-22 Thread Andrew C
Changes by Andrew C andrew.carr...@gmail.com: Added file: http://bugs.python.org/file35730/82706ea73ada.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353 ___

[issue3353] make built-in tokenizer available via Python C API

2014-06-20 Thread Zachary Ware
Changes by Zachary Ware zachary.w...@gmail.com: -- versions: +Python 3.5 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353 ___ ___

[issue3353] make built-in tokenizer available via Python C API

2011-09-07 Thread Meador Inge
Meador Inge mead...@gmail.com added the comment: It would be nice if this same C API was used to implement the 'tokenize' module. Issues like issue2180 will potentially require bug fixes in two places :-/ -- nosy: +meadori ___ Python tracker

[issue3353] make built-in tokenizer available via Python C API

2010-08-09 Thread Terry J. Reedy
Changes by Terry J. Reedy tjre...@udel.edu: -- versions: -Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353 ___ ___ Python-bugs-list

[issue3353] make built-in tokenizer available via Python C API

2009-05-16 Thread Daniel Diniz
Changes by Daniel Diniz aja...@gmail.com: -- priority: - normal stage: - test needed versions: +Python 3.2 -Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3353 ___

[issue3353] make built-in tokenizer available via Python C API

2008-07-26 Thread Andy
Andy [EMAIL PROTECTED] added the comment: Did that and it builds fine. So my test procedure was: - checkout clean source - apply patch as per guidelines - remove the file Psrser/tokenizer.h (*) - ./configure - make - ./python setup.py install Build platform: Ubuntu , gcc 4.2.3 All

[issue3353] make built-in tokenizer available via Python C API

2008-07-24 Thread Fredrik Lundh
Fredrik Lundh [EMAIL PROTECTED] added the comment: That's should be all that's needed to expose the existing API, as is. If you want to verify the build, you can grab the pytoken.c and setup.py files from this directory, and try building the module.

[issue3353] make built-in tokenizer available via Python C API

2008-07-23 Thread Andy
Andy [EMAIL PROTECTED] added the comment: Sorry for the terribly dumb question about this. Are you meaning that, at this stage, all that is required is: 1. the application of the PyAPI_FUNC macro 2. move the file to the Include directory 3. update Makefile.pre.in to point to the new

[issue3353] make built-in tokenizer available via Python C API

2008-07-21 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment: IMO the struct tok_state should not be part of the API, it contains too many implementation details. Or maybe as an opaque structure. -- nosy: +amaury.forgeotdarc ___ Python tracker [EMAIL

[issue3353] make built-in tokenizer available via Python C API

2008-07-21 Thread Fredrik Lundh
Fredrik Lundh [EMAIL PROTECTED] added the comment: There are a few things in the struct that needs to be public, but that's nothing that cannot be handled by documentation. No need to complicate the API just in case. ___ Python tracker [EMAIL PROTECTED]

[issue3353] make built-in tokenizer available via Python C API

2008-07-14 Thread Fredrik Lundh
New submission from Fredrik Lundh [EMAIL PROTECTED]: CPython provides a Python-level API to the parser, but not to the tokenizer itself. Somewhat annoyingly, it does provide a nice C API, but that's not properly exposed for external modules. To fix this, the tokenizer.h file should be moved