Re: Using the CSV module
On May 9, 6:40 pm, "Nathan Harmston" <[EMAIL PROTECTED]> wrote: > Hi, > > I ve been playing with the CSV module for parsing a few files. A row > in a file looks like this: > > some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_data\t\n > > so the lineterminator is \t\n and the delimiter is \t|\t, however when > I subclass Dialect and try to set delimiter is "\t|\t" it says > delimiter can only be a character. > > I know its an easy fix to just do .strip("\t") on the output I get, > but I was wondering > a) if theres a better way of doing this when the file is actually > being parsed by the csv module No; usually one would want at least to do .strip() on each field anyway to remove *all* leading and trailing whitespace. Replacing multiple whitespace characters with one space is often a good idea. One may want to get fancier and ensure that NO-BREAK SPACE aka (\xA0 in many encodings) is treated as whitespace. So your gloriously redundant tabs vanish, for free. > b) Why are delimiters only allowed to be one character in length. Speed. The reader is a hand-crafted finite-state machine designed to operate on a byte at a time. Allowing for variable-length delimiters would increase the complexity and lower the speed -- for what gain? How often does one see 2-byte or 3-byte delimiters? -- http://mail.python.org/mailman/listinfo/python-list
Re: Using the CSV module
I ve just finished writing one, I wanted to stay with the batteries included approach as much as possible though. Is there anyway I can request a change to the csv module? Thanks Nathan On 09/05/07, Stefan Sonnenberg-Carstens <[EMAIL PROTECTED]> wrote: > Most of the time I found the CSV module not as useful as it might be - > due to the restrictions you describe. > > Why not write a simple parser class ? > > On Mi, 9.05.2007, 10:40, Nathan Harmston wrote: > > Hi, > > > > I ve been playing with the CSV module for parsing a few files. A row > > in a file looks like this: > > > > some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_data\t\n > > > > so the lineterminator is \t\n and the delimiter is \t|\t, however when > > I subclass Dialect and try to set delimiter is "\t|\t" it says > > delimiter can only be a character. > > > > I know its an easy fix to just do .strip("\t") on the output I get, > > but I was wondering > > a) if theres a better way of doing this when the file is actually > > being parsed by the csv module > > b) Why are delimiters only allowed to be one character in length. > > > > Many Thanks in advance > > Nathan > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > > > > -- http://mail.python.org/mailman/listinfo/python-list
Re: Using the CSV module
Most of the time I found the CSV module not as useful as it might be - due to the restrictions you describe. Why not write a simple parser class ? On Mi, 9.05.2007, 10:40, Nathan Harmston wrote: > Hi, > > I ve been playing with the CSV module for parsing a few files. A row > in a file looks like this: > > some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_data\t\n > > so the lineterminator is \t\n and the delimiter is \t|\t, however when > I subclass Dialect and try to set delimiter is "\t|\t" it says > delimiter can only be a character. > > I know its an easy fix to just do .strip("\t") on the output I get, > but I was wondering > a) if theres a better way of doing this when the file is actually > being parsed by the csv module > b) Why are delimiters only allowed to be one character in length. > > Many Thanks in advance > Nathan > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list
Using the CSV module
Hi, I ve been playing with the CSV module for parsing a few files. A row in a file looks like this: some_id\t|\tsome_data\t|t\some_more_data\t|\tlast_data\t\n so the lineterminator is \t\n and the delimiter is \t|\t, however when I subclass Dialect and try to set delimiter is "\t|\t" it says delimiter can only be a character. I know its an easy fix to just do .strip("\t") on the output I get, but I was wondering a) if theres a better way of doing this when the file is actually being parsed by the csv module b) Why are delimiters only allowed to be one character in length. Many Thanks in advance Nathan -- http://mail.python.org/mailman/listinfo/python-list