[issue12063] tokenize module appears to treat unterminated single and double-quoted strings inconsistently

Devin Jeanpierre Thu, 12 May 2011 07:23:17 -0700

New submission from Devin Jeanpierre <[email protected]>:

Tokenizing `' 1 2 3` versus `''' 1 2 3` yields different results.


Tokenizing `' 1 2 3` gives:

1,0-1,1:        ERRORTOKEN      "'"
1,2-1,3:        NUMBER  '1'
1,4-1,5:        NUMBER  '2'
1,6-1,7:        NUMBER  '3'
2,0-2,0:        ENDMARKER       ''

while tokenizing `''' 1 2 3` yields:

Traceback (most recent call last):
  File "prog.py", line 4, in <module>
    tokenize.tokenize(iter(["''' 1 2 3"]).next)
  File "/usr/lib/python2.6/tokenize.py", line 169, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/usr/lib/python2.6/tokenize.py", line 175, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/usr/lib/python2.6/tokenize.py", line 296, in generate_tokens
    raise TokenError, ("EOF in multi-line string", strstart)
tokenize.TokenError: ('EOF in multi-line string', (1, 0))


Apparently tokenize decides to re-tokenize after the erroneous quote in the 
case of a single-quote, but not a triple-quote. I guess that this is because 
retokenizing the rest of the file after an unclosed triple-quote would be 
expensive; however, I've also been told it's very strange and possibly wrong 
for tokenize to be inconsistent this way.

If this is the right behavior, I guess I'd like it if it were documented. This 
sort of thing is confusing / potentially misleading for users of the tokenize 
module. Or at least, when I saw how single quotes were handled, I assumed 
incorrectly that all quotes were handled that way.

----------
messages: 135836
nosy: Devin Jeanpierre
priority: normal
severity: normal
status: open
title: tokenize module appears to treat unterminated single and double-quoted 
strings inconsistently

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12063>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12063] tokenize module appears to treat unterminated single and double-quoted strings inconsistently

Reply via email to