hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!

Here is a patch (not a perfect one):
# ------- begin of patch -------
class PatchedSniffer(csv.Sniffer):

  def __init__(self):
    csv.Sniffer.__init__(self)


  def sniff(self, p_data, p_delimiters = None):
    t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters)
    t_dialect.lineterminator = self._guessLineTerminator(p_data)
    return t_dialect


  def _guessLineTerminator(self, p_data):
    for t_lineTerminator in ['\r\n', '\n', '\r']:
      if t_lineTerminator in p_data:
        return t_lineTerminator
    else:
      return '\r\n' # Windows default (Excel)


  def _formatDataForGuess(self, p_data):
    t_lineTerminator = self._guessLineTerminator(p_data)
    return '\n'.join(p_data.split(t_lineTerminator))


  def _guess_delimiter(self, p_data, p_delimiters):
    t_data = self._formatDataForGuess(p_data)

    (t_delimiter, t_skipInitialSpace) = \
      csv.Sniffer._guess_delimiter(self, t_data, p_delimiters)

    if t_delimiter == '' and '\t' in p_data:
      t_delimiter = '\t'

    return (t_delimiter, t_skipInitialSpace)
# ------- end of patch -------

Bye.
------- Laurent.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to