[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees g...@garethrees.org added the comment: Terry: agreed. Does anyone actually use this module? Does anyone know what the design goals are for tokenize? If someone can tell me, I'll do my best to make it meet them. Meanwhile, here's another bug. Each character of trailing whitespace is tokenized as an ERRORTOKEN. Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type help, copyright, credits or license for more information. from tokenize import tokenize,untokenize from io import BytesIO list(tokenize(BytesIO('1 '.encode('utf8')).readline)) [TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''), TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1 '), TokenInfo(type=54 (ERRORTOKEN), string=' ', start=(1, 1), end=(1, 2), line='1 '), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')] -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees g...@garethrees.org added the comment: Please find attached a patch containing four bug fixes for untokenize(): * untokenize() now always returns a bytes object, defaulting to UTF-8 if no ENCODING token is found (previously it returned a string in this case). * In compatibility mode, untokenize() successfully processes all tokens from an iterator (previously it discarded the first token). * In full mode, untokenize() now returns successfully (previously it asserted). * In full mode, untokenize() successfully processes tokens that were separated by a backslashed newline in the original source (previously it ran these tokens together). In addition, I've added some unit tests: * Test case for backslashed newline. * Test case for missing ENCODING token. * roundtrip() tests both modes of untokenize() (previously it just tested compatibility mode). and updated the documentation: * Update the docstring for untokenize to better describe its actual behaviour, and remove the false claim Untokenized source will match input source exactly. (We can restore this claim if we ever fix tokenize/untokenize so that it's true.) * Update the documentation for untokenize in tokenize.rdt to match the docstring. I welcome review: this is my first proper patch to Python. -- keywords: +patch Added file: http://bugs.python.org/file22842/Issue12691.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
New submission from Gareth Rees g...@garethrees.org: On Mac OS 10.7, test_faulthandler fails. See test output below. It looks as though the tests may be at fault in expecting to see (?:Segmentation fault|Bus error) instead of (?:Segmentation fault|Bus error|Illegal instruction). test_disable (__main__.FaultHandlerTests) ... ok test_dump_traceback (__main__.FaultHandlerTests) ... ok test_dump_traceback_file (__main__.FaultHandlerTests) ... ok test_dump_traceback_threads (__main__.FaultHandlerTests) ... ok test_dump_traceback_threads_file (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_cancel (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_file (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_repeat (__main__.FaultHandlerTests) ... ok test_dump_tracebacks_later_twice (__main__.FaultHandlerTests) ... ok test_enable_file (__main__.FaultHandlerTests) ... FAIL test_enable_single_thread (__main__.FaultHandlerTests) ... FAIL test_fatal_error (__main__.FaultHandlerTests) ... ok test_gil_released (__main__.FaultHandlerTests) ... FAIL test_is_enabled (__main__.FaultHandlerTests) ... ok test_read_null (__main__.FaultHandlerTests) ... FAIL test_register (__main__.FaultHandlerTests) ... ok test_register_chain (__main__.FaultHandlerTests) ... ok test_register_file (__main__.FaultHandlerTests) ... ok test_register_threads (__main__.FaultHandlerTests) ... ok test_sigabrt (__main__.FaultHandlerTests) ... ok test_sigbus (__main__.FaultHandlerTests) ... ok test_sigfpe (__main__.FaultHandlerTests) ... ok test_sigill (__main__.FaultHandlerTests) ... ok test_sigsegv (__main__.FaultHandlerTests) ... ok test_stack_overflow (__main__.FaultHandlerTests) ... ok test_unregister (__main__.FaultHandlerTests) ... ok == FAIL: test_enable_file (__main__.FaultHandlerTests) -- Traceback (most recent call last): File test_faulthandler.py, line 207, in test_enable_file filename=filename) File test_faulthandler.py, line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File string, line 4 in module$' not found in 'Fatal Python error: Illegal instruction\n\nCurrent thread XXX:\n File string, line 4 in module' == FAIL: test_enable_single_thread (__main__.FaultHandlerTests) -- Traceback (most recent call last): File test_faulthandler.py, line 217, in test_enable_single_thread all_threads=False) File test_faulthandler.py, line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nTraceback\\ \\(most\\ recent\\ call\\ first\\):\n File string, line 3 in module$' not found in 'Fatal Python error: Illegal instruction\n\nTraceback (most recent call first):\n File string, line 3 in module' == FAIL: test_gil_released (__main__.FaultHandlerTests) -- Traceback (most recent call last): File test_faulthandler.py, line 195, in test_gil_released '(?:Segmentation fault|Bus error)') File test_faulthandler.py, line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File string, line 3 in module$' not found in 'Fatal Python error: Illegal instruction\n\nCurrent thread XXX:\n File string, line 3 in module' == FAIL: test_read_null (__main__.FaultHandlerTests) -- Traceback (most recent call last): File test_faulthandler.py, line 115, in test_read_null '(?:Segmentation fault|Bus error)') File test_faulthandler.py, line 105, in check_fatal_error self.assertRegex(output, regex) AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n File string, line 3 in module$' not found in 'Fatal Python error: Illegal instruction\n\nCurrent thread XXX:\n File string, line 3 in module' -- Ran 27 tests in 21.711s FAILED (failures=4
[issue12691] tokenize.untokenize is broken
Gareth Rees g...@garethrees.org added the comment: Thanks Ezio for the review. I've made all the changes you requested, (except for the re-ordering of paragraphs in the documentation, which I don't want to do because that would lead to the round-trip property being mentioned before it's defined). Revised patch attached. -- Added file: http://bugs.python.org/file22844/Issue12691.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees g...@garethrees.org added the comment: I'm having a look to see if I can make tokenize.py better match the real tokenizer, but I need some feedback on a couple of design decisions. First, how to handle tokenization errors? There are three possibilities: 1. Generate an ERRORTOKEN, resynchronize, and continue to tokenize from after the error. This is what tokenize.py currently does in the two cases where it detects an error. 2. Generate an ERRORTOKEN and stop tokenizing. This is what tokenizer.c does. 3. Raise an exception (IndentationError, SyntaxError, or TabError). This is what the user sees when the parser is invoked from pythonrun.c. Since the documentation for tokenize.py says, It is designed to match the working of the Python tokenizer exactly, I think that implementing option (2) is best here. (This will mean changing the behaviour of tokenize.py in the two cases where it currently detects an error, so that it stops tokenizing.) Second, how to record the cause of the error? The real tokenizer records the cause of the error in the 'done' field of the 'tok_state structure, but tokenize.py loses this information. I propose to add fields to the TokenInfo structure (which is a namedtuple) to record this information. The real tokenizer uses numeric constants from errcode.h (E_TOODEEP, E_TABSPACE, E_DEDENT etc), and pythonrun.c converts these to English-language error messages (E_TOODEEP: too many levels of indentation). Both of these pieces of information will be useful, so I propose to add two fields error (containing a string like TOODEEP) and errormessage (containing the English-language error message). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees g...@garethrees.org added the comment: Having looked at some of the consumers of the tokenize module, I don't think my proposed solutions will work. It seems to be the case that the resynchronization behaviour of tokenize.py is important for consumers that are using it to transform arbitrary Python source code (like 2to3.py). These consumers are relying on the roundtrip property that X == untokenize(tokenize(X)). So solution (1) is necessary for the handling of tokenization errors. Also, that fact that TokenInfo is a 5-tuple is relied on in some places (e.g. lib2to3/patcomp.py line 38), so it can't be extended. And there are consumers (though none in the standard library) that are relying on type=ERRORTOKEN being the way to detect errors in a tokenization stream. So I can't overload that field of the structure. Any good ideas for how to record the cause of error without breaking backwards compatibility? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees g...@garethrees.org added the comment: Ah ... TokenInfo is a *subclass* of namedtuple, so I can add extra properties to it without breaking consumers that expect it to be a 5-tuple. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
New submission from Gareth Rees g...@garethrees.org: tokenize.untokenize is completely broken. Python 3.2.1 (default, Jul 19 2011, 00:09:43) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type help, copyright, credits or license for more information. import tokenize, io t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline)) tokenize.untokenize(t) Traceback (most recent call last): File stdin, line 1, in module File /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py, line 250, in untokenize out = ut.untokenize(iterable) File /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py, line 179, in untokenize self.add_whitespace(start) File /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/tokenize.py, line 165, in add_whitespace assert row = self.prev_row AssertionError The assertion is simply bogus: the = should be =. The reason why no-one has spotted this is that the unit tests for the tokenize module only ever call untokenize() in compatibility mode, passing in a 2-tuple instead of a 5-tuple. I propose to fix this, and add unit tests, at the same time as fixing other problems with tokenize.py (issue12675). -- components: Library (Lib) messages: 141634 nosy: Gareth.Rees priority: normal severity: normal status: open title: tokenize.untokenize is broken type: behavior versions: Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees g...@garethrees.org added the comment: See my last paragraph: I propose to deliver a single patch that fixes both this bug and issue12675. I hope this is OK. (If you prefer, I'll try to split the patch in two.) I just noticed another bug in untokenize(): in compatibility mode, if untokenize() is passed an iterator rather than a list, then the first token gets discarded: Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type help, copyright, credits or license for more information. from tokenize import untokenize from token import * untokenize([(NAME, 'hello')]) 'hello ' untokenize(iter([(NAME, 'hello')])) '' No-one's noticed this because the unit tests only ever pass lists to untokenize(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
New submission from Gareth Rees g...@garethrees.org: The tokenize module is happy to tokenize Python source code that the real tokenizer would reject. Pretty much any instance where tokenizer.c returns ERRORTOKEN will illustrate this feature. Here are some examples: Python 3.3.0a0 (default:2d69900c0820, Aug 1 2011, 13:46:51) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type help, copyright, credits or license for more information. from tokenize import generate_tokens from io import StringIO def tokens(s): ...Return a string showing the tokens in the string s. ...return '|'.join(t[1] for t in generate_tokens(StringIO(s).readline)) ... # Bad exponent print(tokens('1if 2else 3')) 1|if|2|else|3| 1if 2else 3 File stdin, line 1 1if 2else 3 ^ SyntaxError: invalid token # Bad hexadecimal constant. print(tokens('0xfg')) 0xf|g| 0xfg File stdin, line 1 0xfg ^ SyntaxError: invalid syntax # Missing newline after continuation character. print(tokens('\\pass')) \|pass| \pass File stdin, line 1 \pass ^ SyntaxError: unexpected character after line continuation character It is surprising that the tokenize module does not yield the same tokens as Python itself, but as this limitation only affects incorrect Python code, perhaps it just needs a mention in the tokenize documentation. Something along the lines of, The tokenize module generates the same tokens as Python's own tokenizer if it is given correct Python code. However, it may incorrectly tokenize Python code containing syntax errors that the real tokenizer would reject. -- components: Library (Lib) messages: 141503 nosy: Gareth.Rees priority: normal severity: normal status: open title: tokenize module happily tokenizes code with syntax errors type: behavior versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12675] tokenize module happily tokenizes code with syntax errors
Gareth Rees g...@garethrees.org added the comment: These errors are generated directly by the tokenizer. In tokenizer.c, the tokenizer generates ERRORTOKEN when it encounters something it can't tokenize. This causes parsetok() in parsetok.c to stop tokenizing and return an error. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12675 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12514] timeit disables garbage collection if timed code raises an exception
New submission from Gareth Rees g...@garethrees.org: If you call timeit.timeit and the timed code raises an exception, then garbage collection is disabled. I have verified this in Python 2.7 and 3.2. Here's an interaction with Python 3.2: Python 3.2 (r32:88445, Jul 7 2011, 15:52:49) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type help, copyright, credits or license for more information. import timeit, gc gc.isenabled() True timeit.timeit('raise Exception') Traceback (most recent call last): File stdin, line 1, in module File /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py, line 228, in timeit return Timer(stmt, setup, timer).timeit(number) File /opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/timeit.py, line 194, in timeit timing = self.inner(it, self.timer) File timeit-src, line 6, in inner Exception gc.isenabled() False The problem is with the following code in Lib/timeit.py (lines 192–196): gcold = gc.isenabled() gc.disable() timing = self.inner(it, self.timer) if gcold: gc.enable() This should be changed to something like this: gcold = gc.isenabled() gc.disable() try: timing = self.inner(it, self.timer) finally: if gcold: gc.enable() -- components: Library (Lib) messages: 139978 nosy: Gareth.Rees priority: normal severity: normal status: open title: timeit disables garbage collection if timed code raises an exception type: behavior versions: Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12514 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12514] timeit disables garbage collection if timed code raises an exception
Gareth Rees g...@garethrees.org added the comment: Patch attached. -- keywords: +patch Added file: http://bugs.python.org/file22605/issue12514.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12514 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com