Hyukjin Kwon created SPARK-21356:
------------------------------------

             Summary: CSV datasource failed to parse a value having newline in 
its value
                 Key: SPARK-21356
                 URL: https://issues.apache.org/jira/browse/SPARK-21356
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Hyukjin Kwon
            Priority: Trivial


This is related with SPARK-21355. I guess this is also a rather corner case. I 
found this during testing SPARK-21289.

It looks a bug in Univocity.

The codes below failed to parse newline in the value.

{code}
scala> spark.read.csv(Seq("a\nb", "abc").toDS).show()
+---+
|_c0|
+---+
|  a|
|abc|
+---+
{code}

But working around can be easily done with quotes as below:

{code}
scala> spark.read.csv(Seq("\"a\nb\"", "abc").toDS).show()
+---+
|_c0|
+---+
|a
b|
|abc|
+---+
{code}

Meaning this works:

with the file below:

{code}
"a
b",abc
{code}


{code}
scala> spark.read.option("multiLine", true).csv("tmp.csv").show()
+---+---+
|_c0|_c1|
+---+---+
|a
b|abc|
+---+---+
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to