[issue9974] tokenizer.untokenize not invariant with line continuations

Brian Bossé Fri, 01 Oct 2010 09:09:30 -0700

Brian Bossé <[email protected]> added the comment:

No idea if I'm getting the patch format right here, but tally ho!


This is keyed from release27-maint

Index: Lib/tokenize.py
===================================================================
--- Lib/tokenize.py     (revision 85136)
+++ Lib/tokenize.py     (working copy)
@@ -184,8 +184,13 @@
 
     def add_whitespace(self, start):
         row, col = start
-        assert row <= self.prev_row
         col_offset = col - self.prev_col
+        # Nearly all newlines are handled by the NL and NEWLINE tokens,
+        # but explicit line continuations are not, so they're handled here.
+        if row > self.prev_row:  
+            row_offset = row - self.prev_row
+            self.tokens.append("\\\n" * row_offset)
+            col_offset = col  # Recalculate the column offset from the start 
of our new line
         if col_offset:
             self.tokens.append(" " * col_offset)

Two issues remain with this fix, both of which replace the assert with 
something functional but not exactly what the original text is:
1)  Whitespace leading up to a line continuation is not recreated.  The 
information required to do this is not present in the tokenized data.
2)  If EOF happens at the end of a line, the untokenized version will have a 
line continuation on the end, as the ENDMARKER token is represented on a line 
which does not exist in the original.

I spent some time trying to get a unit test written that demonstrates the 
original bug, but it would seem that doctest (which test_tokenize uses) cannot 
represent a '\' character properly.  The existing unit tests involving line 
continuations pass due to the '\' characters being interpreted as ERRORTOKEN, 
which is not as they're done when read from file or interactive prompt.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9974>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9974] tokenizer.untokenize not invariant with line continuations

Reply via email to