Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19814
To be honest, I would like to suggest disallow it. I just ran few tests and
looks we are still not able to read it back:
Empty `quote` (\u0000):
```scala
Seq(Tuple2("a\n a", "b \nb"), Tuple2("c\n c", "d
\nd")).toDF.write.mode("overwrite").option("quote", "").csv("tmp.csv")
spark.read.option("multiLine", true).option("quote",
"").csv("tmp.csv").collect()
```
```
Array[org.apache.spark.sql.Row] = Array([ a?,??b ], [b?,null], [ c?,??d ],
[d?,null])
```
If \u0000 really disables quoting when read, I think it should give the
same results when `quote` is another character for example:
```
Seq(Tuple2("a\n a", "b \nb"), Tuple2("c\n c", "d
\nd")).toDF.write.mode("overwrite").option("quote", "").csv("tmp.csv")
spark.read.option("multiLine", true).option("quote",
"^").csv("tmp.csv").collect()
```
```
Array[org.apache.spark.sql.Row] = Array([ a?,?b ], [b?,null], [ c?,?d ],
[d?,null])
```
but the output is different as above.
It's Array(0, 98, 32) vs Array(0, 0, 98, 32) in `"?b "` vs `"??b "`
Default `quote`:
```
Seq(Tuple2("a\n a", "b \nb"), Tuple2("c\n c", "d
\nd")).toDF.write.mode("overwrite").csv("tmp.csv")
spark.read.option("multiLine", true).csv("tmp.csv").collect()
```
```
Array[org.apache.spark.sql.Row] =
Array([a
a,b
b], [c
c,d
d])
```
Another `quote`:
```scala
Seq(Tuple2("a\n a", "b \nb"), Tuple2("c\n c", "d
\nd")).toDF.write.mode("overwrite").option("quote", "!").csv("tmp.csv")
spark.read.option("multiLine", true).option("quote",
"!").csv("tmp.csv").collect()
```
```
Array[org.apache.spark.sql.Row] =
Array([a
a,b
b], [c
c,d
d])
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]