[ https://issues.apache.org/jira/browse/SPARK-21356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21356. ---------------------------------- Resolution: Invalid I am resolving this as the workaround looks so easy and I am not sure if it makes sense to allow newline in its value without quotes for now. > CSV datasource failed to parse a value having newline in its value > ------------------------------------------------------------------ > > Key: SPARK-21356 > URL: https://issues.apache.org/jira/browse/SPARK-21356 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Hyukjin Kwon > Priority: Trivial > > This is related with SPARK-21355. I guess this is also a rather corner case. > I found this during testing SPARK-21289. > It looks a bug in Univocity. > The codes below failed to parse newline in the value. > {code} > scala> spark.read.csv(Seq("a\nb", "abc").toDS).show() > +---+ > |_c0| > +---+ > | a| > |abc| > +---+ > {code} > But working around can be easily done with quotes as below: > {code} > scala> spark.read.csv(Seq("\"a\nb\"", "abc").toDS).show() > +---+ > |_c0| > +---+ > |a > b| > |abc| > +---+ > {code} > Meaning this works: > with the file below: > {code} > "a > b",abc > {code} > {code} > scala> spark.read.option("multiLine", true).csv("tmp.csv").show() > +---+---+ > |_c0|_c1| > +---+---+ > |a > b|abc| > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org