[issue9974] tokenizer.untokenize not invariant with line continuations

2014-02-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0f0e9b7d4f1d by Terry Jan Reedy in branch '2.7':
Issue #9974: When untokenizing, use row info to insert backslash+newline.
http://hg.python.org/cpython/rev/0f0e9b7d4f1d

New changeset 24b4cd5695d9 by Terry Jan Reedy in branch '3.3':
Issue #9974: When untokenizing, use row info to insert backslash+newline.
http://hg.python.org/cpython/rev/24b4cd5695d9

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2014-02-23 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I added 5-tuple mode to roundtrip() in #20750. I solved the ENDMARKER problem 
by breaking out of the token loop if and when it appears. Reconstructing 
trailing whitespace other than \n is hopeless. The roundtrip test currently 
only tests equality of token sequences. But my own tests show that code with 
backslash-newline is reconstructed correctly as long as there is no space 
before it and something other than ENDMARKER after it.

I discovered that tokenize will tokenize '\\' but not '\\\n'. So the latter 
will never appear as tokenizer output. Even if we did use ENDMARKER to create 
the latter, it would fail the current roundtrip test.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2014-02-18 Thread Terry J. Reedy

Terry J. Reedy added the comment:

The \ continuation bug is one of many covered by #12691 and its patch, but this 
came first and it focused on only this bug. With respect to this issue, the 
code patches are basically the same; I will use tests to choose between them.

On #12691, Gareth notes that the 5-tuple mode that uses add-whitespace is under 
tested, so care is needed to not break working uses.

Adding a new parameter to a function is a new feature. I will check on pydev 
that no one objects to calling Untokenizer a private implementation detail.

--
priority: low - normal

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2014-02-17 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
assignee:  - terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2014-02-01 Thread Terry J. Reedy

Terry J. Reedy added the comment:

One could argue that The guarantee applies only to the token type and token 
string as the spacing between tokens (column positions) may change. covers 
merging of lines, but total elimination of needed whitespace is definitely a 
bug.

--
nosy: +terry.reedy
stage:  - patch review
versions: +Python 3.3, Python 3.4 -Python 2.6, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2013-09-08 Thread Dwayne Litzenberger

Dwayne Litzenberger added the comment:

@amk: I'd appreciate it if you did. :)

I ran into this bug while writing some code that converts b... into ... in 
PyCrypto's setup.py script (for backward compatibility with Python 2.5 and 
below).

--
nosy: +DLitz

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2012-12-10 Thread Meador Inge

Changes by Meador Inge mead...@gmail.com:


--
nosy: +meador.inge

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2012-12-03 Thread Simon Law

Changes by Simon Law sfl...@sfllaw.ca:


--
nosy: +sfllaw

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2012-11-09 Thread Eric Snow

Changes by Eric Snow ericsnowcurren...@gmail.com:


--
nosy: +eric.snow

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2012-11-03 Thread A.M. Kuchling

A.M. Kuchling added the comment:

I looked at this a bit and made a revised version of the patch that doesn't add 
any line continuations when the token is ENDMARKER.  It works on the example 
program and a few variations I tried, though I'm not convinced that it'll work 
for all possible permutations of line continuations, whitespace, and ENDMARKER. 
 (I couldn't find one that failed, though.)

Is this worth pursuing?  I could put together the necessary test cases.

--
nosy: +akuchling
Added file: http://bugs.python.org/file27873/issue9974-2.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-31 Thread Raymond Hettinger

Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

 My patch handles the described situation, albeit a bit poorly ...

Let us know when you've got a cleaned-up patch and have run the round-trip 
tests on a broad selection of files.

For your test case, don't feel compelled to use doctest.  It's okay to write a 
regular unittest and add that to the test suite.

--
assignee: rhettinger - 
priority: normal - low

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-19 Thread Brian Bossé

Brian Bossé pen...@gmail.com added the comment:

Yup, that's related to ENDMARKER being tokenized to its own line, even if EOF 
happens at the end of the last line of actual code.  I don't know if anything 
relies on that behavior so I can't really suggest changing it.

My patch handles the described situation, albeit a bit poorly as I mentioned in 
comment 2.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-18 Thread nick caruso

nick caruso ngv...@gmail.com added the comment:

--
import StringIO
import tokenize

tokens = []
def fnord(*a):
tokens.append(a)

tokenize.tokenize(StringIO.StringIO(a = 1).readline, fnord)

tokenize.untokenize(tokens)
--

Generates the same assertion failure, for what it's worth.  No line 
continuation needed.

This does not happen in 2.5 on my machine.

--
nosy: +nick.caruso

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-18 Thread nick caruso

nick caruso ngv...@gmail.com added the comment:

Additionally, substituting a=1\n for a=1 results in no assertion and 
successful untokenizing to a = 1\n

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-01 Thread Brian Bossé

Brian Bossé pen...@gmail.com added the comment:

No idea if I'm getting the patch format right here, but tally ho!

This is keyed from release27-maint

Index: Lib/tokenize.py
===
--- Lib/tokenize.py (revision 85136)
+++ Lib/tokenize.py (working copy)
@@ -184,8 +184,13 @@
 
 def add_whitespace(self, start):
 row, col = start
-assert row = self.prev_row
 col_offset = col - self.prev_col
+# Nearly all newlines are handled by the NL and NEWLINE tokens,
+# but explicit line continuations are not, so they're handled here.
+if row  self.prev_row:  
+row_offset = row - self.prev_row
+self.tokens.append(\\\n * row_offset)
+col_offset = col  # Recalculate the column offset from the start 
of our new line
 if col_offset:
 self.tokens.append(  * col_offset)

Two issues remain with this fix, both of which replace the assert with 
something functional but not exactly what the original text is:
1)  Whitespace leading up to a line continuation is not recreated.  The 
information required to do this is not present in the tokenized data.
2)  If EOF happens at the end of a line, the untokenized version will have a 
line continuation on the end, as the ENDMARKER token is represented on a line 
which does not exist in the original.

I spent some time trying to get a unit test written that demonstrates the 
original bug, but it would seem that doctest (which test_tokenize uses) cannot 
represent a '\' character properly.  The existing unit tests involving line 
continuations pass due to the '\' characters being interpreted as ERRORTOKEN, 
which is not as they're done when read from file or interactive prompt.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-01 Thread Raymond Hettinger

Changes by Raymond Hettinger rhettin...@users.sourceforge.net:


--
assignee:  - rhettinger
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-10-01 Thread Kristján Valur Jónsson

Kristján Valur Jónsson krist...@ccpgames.com added the comment:

Interesting, is that a separate defect of doctest?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-09-28 Thread Brian Bossé

New submission from Brian Bossé pen...@gmail.com:

Executing the following code against a py file which contains line 
continuations generates an assert:
import tokenize
foofile = open(filename, r)
tokenize.untokenize(list(tokenize.generate_tokens(foofile.readline)))

(note, the list() is important due to issue #8478)

The assert triggered is:
Traceback (most recent call last):
  File stdin, line 1, in module
  File C:\Python27\lib\tokenize.py, line 262, in untokenize
return ut.untokenize(iterable)
  File C:\Python27\lib\tokenize.py, line 198, in untokenize
self.add_whitespace(start)
  File C:\Python27\lib\tokenize.py, line 187, in add_whitespace
assert row = self.prev_row
AssertionError

I have tested this in 2.6.5, 2.7 and 3.1.2.  The line numbers may differ but 
the stack is otherwise identical between these versions.

Example input code:
foo = \
   3

If the assert is removed, the code generated is still incorrect.  For example, 
the input:
foo = 3
if foo == 5 or \
   foo == 1
pass

becomes:
foo = 3
if foo == 5 orfoo == 1
pass

which besides not having the line continuation, is functionally incorrect.

I'm wrapping my head around the functionality of this module and am willing to 
do the legwork to get a fix in.  Ideas on how to go about it are more than 
welcome.

Ironic aside:  this bug is present when tokenize.py itself is used as input.

--
components: Library (Lib)
messages: 117538
nosy: Brian.Bossé
priority: normal
severity: normal
status: open
title: tokenizer.untokenize not invariant with line continuations
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9974] tokenizer.untokenize not invariant with line continuations

2010-09-28 Thread Kristján Valur Jónsson

Changes by Kristján Valur Jónsson krist...@ccpgames.com:


--
nosy: +krisvale

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9974
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com