[issue3353] make built-in tokenizer available via Python C API

Pablo Galindo Salgado Wed, 27 Jan 2021 13:14:38 -0800

Pablo Galindo Salgado <pablog...@gmail.com> added the comment:

Problems that you are going to find:


* The c tokenizer throws syntax errors while the tokenizer module does not. For 
example:

❯ python -c "1_"
  File "<string>", line 1
    1_
     ^
SyntaxError: invalid decimal literal

❯ python -m tokenize <<< "1_"
1,0-1,1:            NUMBER         '1'
1,1-1,2:            NAME           '_'
1,2-1,3:            NEWLINE        '\n'
2,0-2,0:            ENDMARKER      ''

* The encoding cannot be immediately specified. You need to thread it in many 
places.

* The readline() function can now return whatever or be whatever, that needs to 
be handled (better) in the c tokenizer to not crash.

* str/bytes in the c tokenizer.

* The c tokenizer does not get the full line in some cases or is tricky to get 
the full line.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue3353>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3353] make built-in tokenizer available via Python C API

Reply via email to