Sharan Basappa wrote:

> are there any requirements about the format of the CSV file when using
> read_csv from pandas? For example, is it necessary that the csv file has
> to have same number of columns in every line etc.

> ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, 
saw 3

The error message is quite clear, look for extra fields in line 8 of your 
data ;)

Now let's make a few experiments:

>>> import pandas, io
>>> def dump(s):
...     return pandas.read_csv(io.StringIO(s))
>>> dump("""foo,bar
... 1,2
... """
... )
   foo  bar
0    1    2

[1 rows x 2 columns]
>>> dump("""foo,bar
... 1,2,3
... 4,5
... """)
   foo  bar
1    2    3
4    5  NaN

[2 rows x 2 columns]
>>> dump("""foo,bar
... 1,2
... 3,4,5
... """)
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in dump
  File "/usr/lib/python3/dist-packages/pandas/io/", line 420, in 
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/", line 225, in 
  File "/usr/lib/python3/dist-packages/pandas/io/", line 626, in 
    ret =
  File "/usr/lib/python3/dist-packages/pandas/io/", line 1070, in 
    data =
  File "parser.pyx", line 727, in 
  File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory 
  File "parser.pyx", line 802, in pandas.parser.TextReader._read_rows 
  File "parser.pyx", line 789, in pandas.parser.TextReader._tokenize_rows 
  File "parser.pyx", line 1697, in pandas.parser.raise_parser_error 
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 
fields in line 3, saw 3

>From this I infer that no row in the csv file may contain more columns than 
the first data row. Missing columns are added automatically.

There is also an option to suppress rows containing too many columns:

>>> pandas.read_csv(io.StringIO("foo,bar\n1,2\n3,4,5\n6,7"), 
b'Skipping line 3: expected 2 fields, saw 3\n'
   foo  bar
0    1    2
1    6    7

[2 rows x 2 columns]


Reply via email to