Re: Incorrect csv parsing when delimiter used within the data

Sean Owen Tue, 03 Jan 2023 09:03:03 -0800

No, you've set the escape character to double-quote, when it looks like you
mean for it to be the quote character (which it already is). Remove this
setting, as it's incorrect.


On Tue, Jan 3, 2023 at 11:00 AM Saurabh Gulati
<saurabh.gul...@fedex.com.invalid> wrote:

> Hello,
> We are seeing a case with csv data when it parses csv data incorrectly.
> The issue can be replicated using the below csv data
>
> "a","b","c"
> "1","",","
> "2","","abc"
>
> and using the spark csv read command.
>
> df = spark.read.format("csv")\
> .option("multiLine", True)\
> .option("escape", '"')\
> .option("enforceSchema", False) \
> .option("header", True)\
> .load(f"/tmp/test.csv")
>
> df.show(100, False) # prints both rows
> |a  |b       |c  |
> +---+--------+---+
> |1  |null    |,  |
> |2  |null    |abc|
>
> df.select("c").show() # merges last column of first row and first column
> of second row
> +------+
> |     c|
> +------+
> |"\n"2"|
>
> print(df.count()) # prints 1, should be 2
>
>
> It feels like a bug and I thought of asking the community before creating
> a bug on jira.
>
> Mvg/Regards
> Saurabh
>
>

Re: Incorrect csv parsing when delimiter used within the data

Reply via email to