Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
and 2 single quotes together'' are looking like a single double quote ". Mvg/Regards Saurabh Gulati From: Saurabh Gulati Sent: 05 January 2023 12:24 To: Sean Owen Cc: User Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
Its the same input except that headers are also being read with csv reader. Mvg/Regards Saurabh Gulati From: Sean Owen Sent: 04 January 2023 15:12 To: Saurabh Gulati Cc: User Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

Re: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
Cc: Mich Talebzadeh ; User Subject: Re: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data If you have found a parser that works, simply read the data as text files, apply the parser manually, and convert to DataFrame (if needed at all), ___

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Sean Owen
04 January 2023 14:25 > *To:* Saurabh Gulati > *Cc:* Mich Talebzadeh ; User < > user@spark.apache.org> > *Subject:* Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used > within the data > > That input is just invalid as CSV for any parser. You end a quote

Re: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Shay Elbaz
: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data ATTENTION: This email originated from outside of GM. Hi @Sean Owen<mailto:sro...@gmail.com> Probably the data is incorrect, and the source needs to fix it. But using python's csv parser returns the correct r

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Saurabh Gulati
__ From: Sean Owen Sent: 04 January 2023 14:25 To: Saurabh Gulati Cc: Mich Talebzadeh ; User Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data That input is just invalid as CSV for any parser. You end a quoted col witho

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Sean Owen
That input is just invalid as CSV for any parser. You end a quoted col without following with a col separator. What would the intended parsing be and how would it work? On Wed, Jan 4, 2023 at 4:30 AM Saurabh Gulati wrote: > > @Sean Owen Also see the example below with quotes > feedback: > >

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Saurabh Gulati
"|null|null| |2 |null|abc | +---+++ df.select("c").show(10, False) +--------+ |c | ++ |",see what ""I did""| |null

Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Mich Talebzadeh
What is the point of having *,* as a column value? From a business point of view it does not signify anything IMO view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any

Re: Incorrect csv parsing when delimiter used within the data

2023-01-03 Thread Sean Owen
Why does the data even need cleaning? That's all perfectly correct. The error was setting quote to be an escape char. On Tue, Jan 3, 2023, 2:32 PM Mich Talebzadeh wrote: > if you take your source CSV as below > > "a","b","c" > "1","","," > "2","","abc" > > > and define your code as below > > >

Re: Incorrect csv parsing when delimiter used within the data

2023-01-03 Thread Mich Talebzadeh
if you take your source CSV as below "a","b","c" "1","","," "2","","abc" and define your code as below csv_file="hdfs://rhes75:9000/data/stg/test/testcsv.csv" # read hive table in spark listing_df = spark.read.format("com.databricks.spark.csv").option("inferSchema",

Re: Incorrect csv parsing when delimiter used within the data

2023-01-03 Thread Sean Owen
No, you've set the escape character to double-quote, when it looks like you mean for it to be the quote character (which it already is). Remove this setting, as it's incorrect. On Tue, Jan 3, 2023 at 11:00 AM Saurabh Gulati wrote: > Hello, > We are seeing a case with csv data when it parses csv

Incorrect csv parsing when delimiter used within the data

2023-01-03 Thread Saurabh Gulati
Hello, We are seeing a case with csv data when it parses csv data incorrectly. The issue can be replicated using the below csv data "a","b","c" "1","","," "2","","abc" and using the spark csv read command. df = spark.read.format("csv")\ .option("multiLine", True)\ .option("escape", '"')\