Jonathan S. Shapiro wrote:
> Something like that. It's not quite enough if blocks are expressions.
> In effect, you have several types of lexically nested braces, and you
> need to keep track of the innermost active brace type in the current
> lexical context.
>
> So there is a devil in the details, but yes, something like this ought
> to work. And then the trick would be to have the lexer synthetically
> emit OPEN/CLOSE tokens to the parser at the right points.
>
Sorry I'm late to the indentation party. I think an indentation syntax
is a great idea! 8^)
There are a *bunch* of different python lexer implementations.
The one in /usr/local/lib/python../lib/tokenize.py is pretty clear.
It's written using generators.
Here's the relevant snippet:
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ': column = column + 1
elif line[pos] == '\t': column = (column/tabsize +
1)*tabsize
elif line[pos] == '\f': column = 0
else: break
pos = pos + 1
if pos == max: break
[...]
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation
level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield (DEDENT, '', (lnum, pos), (lnum, pos), line)
INDENT and DEDENT are 'synthesized' tokens.
-Sam
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev