I've got a small tweak to tokenize.py that I'd like to run by folks here. I'm working on a refactoring tool for Python 2.x-to-3.x conversion, and my approach is to build a full parse tree with annotations that show where the whitespace and comments go. I use the tokenize module to scan the input. This is nearly perfect (I can render code from the parse tree and it will be an exact match of the input) except for continuation lines -- while the tokenize gives me pseudo-tokens for comments and "ignored" newlines, it doesn't give me the backslashes at all (while it does give me the newline following the backslash).
It would be trivial to add another yield to tokenize.py when the backslah is detected: --- tokenize.py (revision 52865) +++ tokenize.py (working copy) @@ -370,6 +370,8 @@ elif initial in namechars: # ordinary name yield (NAME, token, spos, epos, line) elif initial == '\\': # continued stmt + # This yield is new; needed for better idempotency: + yield (NL, initial, spos, (spos[0], spos[1]+1), line) continued = 1 else: if initial in '([{': parenlev = parenlev + 1 (Though I think that it should probably yield a single NL pseudo-token whose value is a backslash followed by a newline; or perhaps it should yield the backslash as a comment token, or as a new token. Thoughts?) This wouldn't be 100% backwards compatible, so I'm not dreaming of adding this to 2.5.1, but what about 2.6? (There's another issue with tokenize.py too -- when you use it to parse Python-like source code containing non-Python operators, e.g. '?', it does something bogus.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com