[ https://issues.apache.org/jira/browse/SPARK-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074841#comment-15074841 ]
Reynold Xin commented on SPARK-12537: ------------------------------------- My understanding from reading the spec is that a single backslash followed by o is actually invalid. http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf It is also how Python and Jackson implement it. > Add option to accept quoting of all character backslash quoting mechanism > ------------------------------------------------------------------------- > > Key: SPARK-12537 > URL: https://issues.apache.org/jira/browse/SPARK-12537 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.5.2 > Reporter: Cazen Lee > Assignee: Apache Spark > > We can provides the option to choose JSON parser can be enabled to accept > quoting of all character or not. > For example, if JSON file that includes not listed by JSON backslash quoting > specification, it returns corrupt_record > {code:title=JSON File|borderStyle=solid} > {"name": "Cazen Lee", "price": "$10"} > {"name": "John Doe", "price": "\$20"} > {"name": "Tracy", "price": "$10"} > {code} > corrupt_record(returns null) > {code} > scala> df.show > +--------------------+---------+-----+ > | _corrupt_record| name|price| > +--------------------+---------+-----+ > | null|Cazen Lee| $10| > |{"name": "John Do...| null| null| > | null| Tracy| $10| > +--------------------+---------+-----+ > {code} > And after apply this patch, we can enable allowBackslashEscapingAnyCharacter > option like below > {code} > scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", > "true").json("/user/Cazen/test/test2.txt") > df: org.apache.spark.sql.DataFrame = [name: string, price: string] > scala> df.show > +---------+-----+ > | name|price| > +---------+-----+ > |Cazen Lee| $10| > | John Doe| $20| > | Tracy| $10| > +---------+-----+ > {code} > This issue similar to HIVE-11825, HIVE-12717. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org