En Wed, 19 Mar 2008 12:44:05 -0300, <[EMAIL PROTECTED]> escribió: > The csv module contains a Sniffer class which is supposed to deduce the > delimiter and quote character as well as the presence or absence of a > header > in a sample taken from the start of a purported CSV file. I no longer > remember who wrote it, and I've never been a big fan of it. It > determines > the delimiter based almost solely on character frequencies. It doesn't > consider what the actual structure of a CSV file is or that delimiters > and > quote characters are almost always taken from the set of punctuation or > whitespace characters. Consequently, it can cause some occasional > head-scratching: > > >>> sample = """\ > ... abc8def > ... def8ghi > ... ghi8jkl > ... """ > >>> import csv > >>> d = csv.Sniffer().sniff(sample) > >>> d.delimiter > '8' > >>> sample = """\ > ... a8bcdef > ... ab8cdef > ... abc8def > ... abcd8ef > ... """ > >>> d = csv.Sniffer().sniff(sample) > >>> d.delimiter > 'f' > > It's not clear to me that people use letters or digits very often as > delimiters. Both samples above probably represent data from > single-column > files, not double-column files with '8' or 'f' as the delimiter.
I've seen an 'X' used as field separator - but in that case all values were numbers only. > I would be happy to get rid of it in 3.0, but I'm also aware that some > people use it. I'd like feedback from the Python community about this. > If > I removed it is there someone out there who wants it badly enough to > maintain it in PyPI? The Sniffer class already has a "delimiters" parameter; passing string.punctuation seems reasonable in case one wants to restrict the possible delimiter set. I think Sniffer is an useful class - but can't do magic, perhaps a few lines in the docs stating its limitations would be fine. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list