No, you've set the escape character to double-quote, when it looks like you mean for it to be the quote character (which it already is). Remove this setting, as it's incorrect.
On Tue, Jan 3, 2023 at 11:00 AM Saurabh Gulati <saurabh.gul...@fedex.com.invalid> wrote: > Hello, > We are seeing a case with csv data when it parses csv data incorrectly. > The issue can be replicated using the below csv data > > "a","b","c" > "1","","," > "2","","abc" > > and using the spark csv read command. > > df = spark.read.format("csv")\ > .option("multiLine", True)\ > .option("escape", '"')\ > .option("enforceSchema", False) \ > .option("header", True)\ > .load(f"/tmp/test.csv") > > df.show(100, False) # prints both rows > |a |b |c | > +---+--------+---+ > |1 |null |, | > |2 |null |abc| > > df.select("c").show() # merges last column of first row and first column > of second row > +------+ > | c| > +------+ > |"\n"2"| > > print(df.count()) # prints 1, should be 2 > > > It feels like a bug and I thought of asking the community before creating > a bug on jira. > > Mvg/Regards > Saurabh > >