Bengt Richter: > But it does look ahead to recognize += (i.e., it doesn't generate two > successive also-legal tokens of '+' and '=') > so it seems it should be a simple fix.
But that works precisely because of the greedy nature of tokenization. Given "a+=2" the longest token it finds first is "a" because "a+" is not a valid token. The next token is "+=". It isn't just "+" because "+=" is valid. And the last token is "2". Compare to "a+ =2". In this case the tokens are "a", "+", "=", "2" and the result is a syntax error. > >>> for t in tokenize.generate_tokens(StringIO.StringIO('a=b+c; a+=2; > x..y').readline):print t > ... This reinforces what I'm saying, no? Otherwise I don't understand your reason for showing it. > (51, '+=', (1, 8), (1, 10), 'a=b+c; a+=2; x..y') As I said, the "+=" is found as a single token, and not as two tokens merged into __iadd__ by the parser. After some thought I realized that a short explanation may be helpful. There are two stages in parsing a data file, at least in the standard CS way of viewing things. First, tokenize the input. This turns characters into words. Second, parse the words into a structure. The result is a parse tree. Both steps can do a sort of look-ahead. Tokenizers usually only look ahead one character. These are almost invariably based on regular expressions. There are many different parsing algorithms, with different tradeoffs. Python's is a LL(1) parser. The (1) means it can look ahead one token to resolve ambiguities in a language. (The LL is part of a classification scheme which summarizes how the algorithm works.) Consider if 1..3 were to be legal syntax. Then the tokenizer would need to note the ambiguity that the first token could be a "1." or a "1". If "1." then then next token could be a "." or a ".3". In fact, here is the full list of possible choices <1.> <.> <3> same as getattr(1., 3) <1> <.> <.> 3 not legal syntax <1.> <.3> not legal syntax <1> <..> <3> legal with the proposed syntax. Some parsers can handle this ambiguity, but Python's deliberately does not. Why? Because people also find it tricky to resolve ambiguity (hence problems with precedence rules). After all, should 1..2 be interpreted as 1. . 2 or as 1 .. 2? What about 1...2? (Is it 1. .. 2, 1 .. .2 or 1. . .2 ?) Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list