Re: Reading csv files with quoted fields containing embedded commas

2016-11-06 Thread Femi Anthony
The quote options seem to be related to escaping quotes and the dataset isn't escaaping quotes. As I said quoted strings with embedded commas is something that pandas handles easily, and even Excel does that as well. Femi On Sun, Nov 6, 2016 at 6:59 AM, Hyukjin Kwon wrote:

Re: Reading csv files with quoted fields containing embedded commas

2016-11-06 Thread Hyukjin Kwon
Hi Femi, Have you maybe tried the quote related options specified in the documentation? http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv Thanks. 2016-11-06 6:58 GMT+09:00 Femi Anthony : > Hi, I am trying to process a very

Reading csv files with quoted fields containing embedded commas

2016-11-05 Thread Femi Anthony
Hi, I am trying to process a very large comma delimited csv file and I am running into problems. The main problem is that some fields contain quoted strings with embedded commas. It seems as if PySpark is unable to properly parse lines containing such fields like say Pandas does. Here is the code