[jira] [Commented] (SPARK-12537) Add option to accept quoting of all character backslash quoting mechanism

Sean Owen (JIRA) Wed, 30 Dec 2015 11:04:19 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075318#comment-15075318
 ]


Sean Owen commented on SPARK-12537:
-----------------------------------

Yeah, by definition, but you've already observed that Python, Jackson and Spark 
do one thing. Since Spark uses Jackson, it's least surprising to follow its 
default. What accepts this non-standard JSON by default and what does it do?

The flip side to your argument is: you can silently corrupt input by making 
this the default. It really needs to be opt in.

Being able to pass through the flag seems fine to me, but I'm strongly against 
changing the default behavior in Spark to not match other known libraries, 
especially given the downside.

> Add option to accept quoting of all character backslash quoting mechanism
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12537
>                 URL: https://issues.apache.org/jira/browse/SPARK-12537
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Cazen Lee
>            Assignee: Apache Spark
>
> We can provides the option to choose JSON parser can be enabled to accept 
> quoting of all character or not.
> For example, if JSON file that includes not listed by JSON backslash quoting 
> specification, it returns corrupt_record
> {code:title=JSON File|borderStyle=solid}
> {"name": "Cazen Lee", "price": "$10"}
> {"name": "John Doe", "price": "\$20"}
> {"name": "Tracy", "price": "$10"}
> {code}
> corrupt_record(returns null)
> {code}
> scala> df.show
> +--------------------+---------+-----+
> |     _corrupt_record|     name|price|
> +--------------------+---------+-----+
> |                null|Cazen Lee|  $10|
> |{"name": "John Do...|     null| null|
> |                null|    Tracy|  $10|
> +--------------------+---------+-----+
> {code}
> And after apply this patch, we can enable allowBackslashEscapingAnyCharacter 
> option like below
> {code}
> scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", 
> "true").json("/user/Cazen/test/test2.txt")
> df: org.apache.spark.sql.DataFrame = [name: string, price: string]
> scala> df.show
> +---------+-----+
> |     name|price|
> +---------+-----+
> |Cazen Lee|  $10|
> | John Doe|  $20|
> |    Tracy|  $10|
> +---------+-----+
> {code}
> This issue similar to HIVE-11825, HIVE-12717.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12537) Add option to accept quoting of all character backslash quoting mechanism

Reply via email to