"John Machin" <sj,,,[EMAIL PROTECTED]> wrote:

> 
> I don't know what you mean by "requires more than one
> character of lookahead" -- any non-Mickey-Mouse implementation of a
> csv reader will use a finite state machine with about half-a-dozen
> states, and data structures no more complicated than (1) completed
> rows received so far (2) completed fields in current row (3) bytes in
> current field. When a new input byte arrives, what to do can be
> determined based on only that byte and the current state; no look-
> ahead into the input stream is required, nor is any look-back into
> those data structures.
> 

True.

You can even do it more simply - by writing a GetField() that
scans for either the delimiter or end of line or end of file, and 
returns the "field" found, along with the delimiter that caused 
it to exit, and then writing a GetRecord() that repetitively calls
the GetField and assembles the row record until the delimiter 
returned is either the end of line or the end of file, remembering 
that the returned field may be empty, and handling the cases based 
on the delimiter returned when it is.

This also makes all the decisions based on the current character
read, no lookahead as far as I can see.

Also no state variables, no switch statements...

Is this the method that you would call "Mickey Mouse"?

Actually I lie about the no state variables - you have to keep track
of where you are in the file - but calling read(1) will do it for you,
so no worries, mate...

*wondering if someone will call him on the current row number
as state variable*

- Hendrik


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to