[ https://issues.apache.org/jira/browse/SPARK-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21355. ---------------------------------- Resolution: Invalid I am resolving this per https://stackoverflow.com/a/42073. {quote} This is of course correct, but I'd like to add the reason for having to do this: the JSON spec at ietf.org/rfc/rfc4627.txt contains this sentence in section 2.5: "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)." Since a newline is a control character, it must be escaped. {quote} > JSON datasource failed to parse a value having newline in its value > ------------------------------------------------------------------- > > Key: SPARK-21355 > URL: https://issues.apache.org/jira/browse/SPARK-21355 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Hyukjin Kwon > Priority: Minor > > I guess this is a rather corner case. I found this during testing SPARK-21289. > It looks a bug in Jackson. > The codes below failed to parse newline in the value. > {code} > scala> spark.read.json(Seq("{ \"f\": \"a\nb\"}", "{ \"f\": > \"abc\"}").toDS).show() > +---------------+----+ > |_corrupt_record| f| > +---------------+----+ > | { "f": "a > b"}|null| > | null| abc| > +---------------+----+ > {code} > Meaning this also does not work > with the JSON files as below: > {code} > {"f": " > d", "f0": 3} > {code} > {code} > scala> spark.read.option("multiLine", true).json("tmp.json").show() > +--------------------+ > | _corrupt_record| > +--------------------+ > |{"f": " > d", "f0"...| > +--------------------+ > {code} > Of course, the codes below work: > {code} > scala> spark.read.json(Seq("{ \"f\": \"ab\"}", "{ \"f\": > \"abc\"}").toDS).show() > +---+ > | f| > +---+ > | ab| > |abc| > +---+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org