On Dec 14, 6:32 am, [EMAIL PROTECTED] wrote:
> Thanks to a lot of help, I've got the outer framework for my tokenizer
> down to this:
>
>     for line_number, line in enumerate(text):
>         output = ''
>
>         for char_number, char in enumerate(line):
>             output += char
>
>         print 'At ' + str(line_number) + ', '+ str(char_number) + ': '
> + output,
>

The inner loop appears to be utterly redundant; AFAIK it can be
replaced by:

    output = line[:]
    char = line[-1] if line else ''
    char_number = len(line)

with the observation that if "line" is empty, your code will crash if
it's the first line, and give misleading values (those belonging to
the most recent non-empty line) for "char" and "char_number"
otherwise.

You mentioned design: I wouldn't call that the outer framework for a
tokeniser; I'd call it an example of one way of collecting the source
to be stuffed into a not yet visible tokeniser i.e. the tokeniser
should be in a separate module with an API that allows you to (a) push
chunks of text into the tokeniser or (b) requires you to supply an
iterable object so that the tokeniser can pull text; the tokeniser
itself should not live inside nested loops in your application code.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to