Terry J. Reedy added the comment:

The no encoding issue was mentioned in #12691, but needed to be opened in a 
separate issue, which is this one. The doc, as opposed to the docstring, says 
"Converts tokens back into Python source code". Python 3.3 source code is 
defined in the reference manual as a sequence of unicode chars. The doc also 
says "The reconstructed script is returned as a single string." In 3.x, that 
also means unicode, not bytes. On the other hand tokenize does not currently 
accept actually Python code (unicode) but only encoded code. I think that 
should change, but that is a different issue (literally).

For this issue, I think the doc and docstring should change to match current 
behavior: output a string unless the tokens (which contain unicode strings, not 
bytes) start with a non-empty ENCODING token. Change the behavior would break 
code that believes the code and doc (as opposed to the docstring).

Since tokenize will only put out ENCODING as the first token, I would be 
inclined to ignore ENCODING thereafter, but that might be seen as an 
impermisable change in behavior.

--
The dropped token issue is the subject of #8478, with patch1. It was mentioned 
again in #12691, among several other issues, and is the subject again of 
duplicate issue #16224 (now closed) with patch2.

The actual issue is that the first token of iterator input gets dropped, but 
not that of lists. The fix is reported on #8478, so dropped token is not part 
of this issue.

----------
assignee: eric.snow -> terry.reedy
nosy: +terry.reedy
versions:  -Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16223>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to