If you have found a parser that works, simply read the data as text files, apply the parser manually, and convert to DataFrame (if needed at all), ________________________________ From: Saurabh Gulati <saurabh.gul...@fedex.com.INVALID> Sent: Wednesday, January 4, 2023 3:45 PM To: Sean Owen <sro...@gmail.com> Cc: Mich Talebzadeh <mich.talebza...@gmail.com>; User <user@spark.apache.org> Subject: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data
ATTENTION: This email originated from outside of GM. Hi @Sean Owen<mailto:sro...@gmail.com> Probably the data is incorrect, and the source needs to fix it. But using python's csv parser returns the correct results. import csv with open("/tmp/test.csv") as c_file: csv_reader = csv.reader(c_file, delimiter=",") for row in csv_reader: print(row) ['a', 'b', 'c'] ['1', '', ',see what "I did",\ni am still writing'] ['2', '', 'abc'] And also, I don't understand why there is a distinction in outputs from df.show() and df.select("c").show() Mvg/Regards Saurabh Gulati Data Platform ________________________________ From: Sean Owen <sro...@gmail.com> Sent: 04 January 2023 14:25 To: Saurabh Gulati <saurabh.gul...@fedex.com> Cc: Mich Talebzadeh <mich.talebza...@gmail.com>; User <user@spark.apache.org> Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data That input is just invalid as CSV for any parser. You end a quoted col without following with a col separator. What would the intended parsing be and how would it work? On Wed, Jan 4, 2023 at 4:30 AM Saurabh Gulati <saurabh.gul...@fedex.com<mailto:saurabh.gul...@fedex.com>> wrote: @Sean Owen<mailto:sro...@gmail.com> Also see the example below with quotes feedback: "a","b","c" "1","",",see what ""I did""," "2","","abc"