Pablo Galindo Salgado <pablog...@gmail.com> added the comment: Problems that you are going to find:
* The c tokenizer throws syntax errors while the tokenizer module does not. For example: ❯ python -c "1_" File "<string>", line 1 1_ ^ SyntaxError: invalid decimal literal ❯ python -m tokenize <<< "1_" 1,0-1,1: NUMBER '1' 1,1-1,2: NAME '_' 1,2-1,3: NEWLINE '\n' 2,0-2,0: ENDMARKER '' * The encoding cannot be immediately specified. You need to thread it in many places. * The readline() function can now return whatever or be whatever, that needs to be handled (better) in the c tokenizer to not crash. * str/bytes in the c tokenizer. * The c tokenizer does not get the full line in some cases or is tricky to get the full line. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue3353> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com