Andrew Dalke wrote:

Bengt Richter:

But it does look ahead to recognize += (i.e., it doesn't generate two
successive also-legal tokens of '+' and '=')
so it seems it should be a simple fix.


But that works precisely because of the greedy nature of tokenization.
Given "a+=2" the longest token it finds first is "a" because "a+"
is not a valid token.  The next token is "+=".  It isn't just "+"
because "+=" is valid.  And the last token is "2".

[...]

You're absolutely right, of course, Andrew, and personally I don't think that this is worth trying to fix. But the original post I responded to was suggesting that an LL(1) grammar couldn't disambiguate "1." and "1..3", which assertion relied on a slight fuzzing of the lines between lexical and syntactical analysis that I didn't want to leave unsharpened.

The fact that Python's existing tokenizer doesn't allow multi-character tokens beginning with a dot after a digit (roughly speaking) is what makes the whole syntax proposal infeasibly hard to adapt to.

regards
 Steve
--
Steve Holden               http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/
Holden Web LLC      +1 703 861 4237  +1 800 494 3119
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to