čt 11. 5. 2023 v 16:04 odesílatel Joel Jacobson <j...@compiler.org> napsal:

> Hi hackers,
>
> I've come across an unexpected behavior in our CSV parser that I'd like to
> bring up for discussion.
>
> % cat example.csv
> id,rating,review
> 1,5,"Great product, will buy again."
> 2,3,"I bought this for my 6" laptop but it didn't fit my 8" tablet"
>
> % psql
> CREATE TABLE reviews (id int, rating int, review text);
> \COPY reviews FROM example.csv WITH CSV HEADER;
> SELECT * FROM reviews;
>
> This gives:
>
> id | rating |                           review
> ----+--------+-------------------------------------------------------------
>   1 |      5 | Great product, will buy again.
>   2 |      3 | I bought this for my 6 laptop but it didn't fit my 8 tablet
> (2 rows)
>
> The parser currently accepts quoting within an unquoted field. This can
> lead to
> data misinterpretation when the quote is part of the field data (e.g.,
> for inches, like in the example).
>
> Our CSV output rules quote an entire field or not at all. But the import of
> fields with mid-field quotes might lead to surprising and undetected
> outcomes.
>
> I think we should throw a parsing error for unescaped mid-field quotes,
> and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field
> quotes are necessary. The error message could suggest this option when it
> encounters an unescaped mid-field quote.
>
> I think the convenience of not having to use an extra option doesn't
> outweigh
> the risk of undetected data integrity issues.
>
> Thoughts?
>

+1

Pavel


> /Joel
>
>

Reply via email to