[issue12691] tokenize.untokenize is broken

Gareth Rees Fri, 05 Aug 2011 16:04:12 -0700

Gareth Rees <[email protected]> added the comment:

Please find attached a patch containing four bug fixes for untokenize():


* untokenize() now always returns a bytes object, defaulting to UTF-8 if no 
ENCODING token is found (previously it returned a string in this case).
* In compatibility mode, untokenize() successfully processes all tokens from an 
iterator (previously it discarded the first token).
* In full mode, untokenize() now returns successfully (previously it asserted).
* In full mode, untokenize() successfully processes tokens that were separated 
by a backslashed newline in the original source (previously it ran these tokens 
together).

In addition, I've added some unit tests:

* Test case for backslashed newline.
* Test case for missing ENCODING token.
* roundtrip() tests both modes of untokenize() (previously it just tested 
compatibility mode).

and updated the documentation:

* Update the docstring for untokenize to better describe its actual behaviour, 
and remove the false claim "Untokenized source will match input source 
exactly". (We can restore this claim if we ever fix tokenize/untokenize so that 
it's true.)
* Update the documentation for untokenize in tokenize.rdt to match the 
docstring.

I welcome review: this is my first proper patch to Python.

----------
keywords: +patch
Added file: http://bugs.python.org/file22842/Issue12691.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12691>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12691] tokenize.untokenize is broken

Reply via email to