"John Machin" <sj,,,[EMAIL PROTECTED]> wrote: > > I don't know what you mean by "requires more than one > character of lookahead" -- any non-Mickey-Mouse implementation of a > csv reader will use a finite state machine with about half-a-dozen > states, and data structures no more complicated than (1) completed > rows received so far (2) completed fields in current row (3) bytes in > current field. When a new input byte arrives, what to do can be > determined based on only that byte and the current state; no look- > ahead into the input stream is required, nor is any look-back into > those data structures. >
True. You can even do it more simply - by writing a GetField() that scans for either the delimiter or end of line or end of file, and returns the "field" found, along with the delimiter that caused it to exit, and then writing a GetRecord() that repetitively calls the GetField and assembles the row record until the delimiter returned is either the end of line or the end of file, remembering that the returned field may be empty, and handling the cases based on the delimiter returned when it is. This also makes all the decisions based on the current character read, no lookahead as far as I can see. Also no state variables, no switch statements... Is this the method that you would call "Mickey Mouse"? Actually I lie about the no state variables - you have to keep track of where you are in the file - but calling read(1) will do it for you, so no worries, mate... *wondering if someone will call him on the current row number as state variable* - Hendrik -- http://mail.python.org/mailman/listinfo/python-list