Hyukjin Kwon created SPARK-21356: ------------------------------------ Summary: CSV datasource failed to parse a value having newline in its value Key: SPARK-21356 URL: https://issues.apache.org/jira/browse/SPARK-21356 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Hyukjin Kwon Priority: Trivial
This is related with SPARK-21355. I guess this is also a rather corner case. I found this during testing SPARK-21289. It looks a bug in Univocity. The codes below failed to parse newline in the value. {code} scala> spark.read.csv(Seq("a\nb", "abc").toDS).show() +---+ |_c0| +---+ | a| |abc| +---+ {code} But working around can be easily done with quotes as below: {code} scala> spark.read.csv(Seq("\"a\nb\"", "abc").toDS).show() +---+ |_c0| +---+ |a b| |abc| +---+ {code} Meaning this works: with the file below: {code} "a b",abc {code} {code} scala> spark.read.option("multiLine", true).csv("tmp.csv").show() +---+---+ |_c0|_c1| +---+---+ |a b|abc| +---+---+ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org