Sharan Basappa wrote: > are there any requirements about the format of the CSV file when using > read_csv from pandas? For example, is it necessary that the csv file has > to have same number of columns in every line etc.
> ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3 The error message is quite clear, look for extra fields in line 8 of your data ;) Now let's make a few experiments: >>> import pandas, io >>> def dump(s): ... return pandas.read_csv(io.StringIO(s)) ... >>> dump("""foo,bar ... 1,2 ... """ ... ) foo bar 0 1 2 [1 rows x 2 columns] >>> dump("""foo,bar ... 1,2,3 ... 4,5 ... """) foo bar 1 2 3 4 5 NaN [2 rows x 2 columns] >>> dump("""foo,bar ... 1,2 ... 3,4,5 ... """) Traceback (most recent call last): File "<stdin>", line 4, in <module> File "<stdin>", line 2, in dump File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 420, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 225, in _read return parser.read() File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 626, in read ret = self._engine.read(nrows) File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1070, in read data = self._reader.read(nrows) File "parser.pyx", line 727, in pandas.parser.TextReader.read (pandas/parser.c:6937) File "parser.pyx", line 749, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7156) File "parser.pyx", line 802, in pandas.parser.TextReader._read_rows (pandas/parser.c:7757) File "parser.pyx", line 789, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:7640) File "parser.pyx", line 1697, in pandas.parser.raise_parser_error (pandas/parser.c:19092) pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3 >From this I infer that no row in the csv file may contain more columns than the first data row. Missing columns are added automatically. There is also an option to suppress rows containing too many columns: >>> pandas.read_csv(io.StringIO("foo,bar\n1,2\n3,4,5\n6,7"), error_bad_lines=False) b'Skipping line 3: expected 2 fields, saw 3\n' foo bar 0 1 2 1 6 7 [2 rows x 2 columns] -- https://mail.python.org/mailman/listinfo/python-list