csv.Sniffer - delete in Python 3.0?

skip Wed, 19 Mar 2008 08:46:31 -0700

The csv module contains a Sniffer class which is supposed to deduce the
delimiter and quote character as well as the presence or absence of a header
in a sample taken from the start of a purported CSV file.  I no longer
remember who wrote it, and I've never been a big fan of it.  It determines
the delimiter based almost solely on character frequencies.  It doesn't
consider what the actual structure of a CSV file is or that delimiters and
quote characters are almost always taken from the set of punctuation or
whitespace characters.  Consequently, it can cause some occasional
head-scratching:


    >>> sample = """\
    ... abc8def
    ... def8ghi
    ... ghi8jkl
    ... """
    >>> import csv
    >>> d = csv.Sniffer().sniff(sample)
    >>> d.delimiter
    '8'
    >>> sample = """\
    ... a8bcdef
    ... ab8cdef
    ... abc8def
    ... abcd8ef
    ... """
    >>> d = csv.Sniffer().sniff(sample)
    >>> d.delimiter
    'f'

It's not clear to me that people use letters or digits very often as
delimiters.  Both samples above probably represent data from single-column
files, not double-column files with '8' or 'f' as the delimiter.

I would be happy to get rid of it in 3.0, but I'm also aware that some
people use it.  I'd like feedback from the Python community about this.  If
I removed it is there someone out there who wants it badly enough to
maintain it in PyPI?

Thanks,

-- 
Skip Montanaro - [EMAIL PROTECTED] - http://www.webfast.com/~skip/
-- 
http://mail.python.org/mailman/listinfo/python-list

csv.Sniffer - delete in Python 3.0?

Reply via email to