[jira] [Commented] (SPARK-17695) Deserialization error when using DataFrameReader.json on JSON line that contains an empty JSON object
[ https://issues.apache.org/jira/browse/SPARK-17695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622415#comment-15622415 ] Miguel Cabrera commented on SPARK-17695: Hi, is there a way to prevent this? besides not using the {{json}} method? I currently mapping the underlying rdd and transforming into {{StringRDD}} with the already serialized json. I am using PySpark though and the default json serializer. > Deserialization error when using DataFrameReader.json on JSON line that > contains an empty JSON object > - > > Key: SPARK-17695 > URL: https://issues.apache.org/jira/browse/SPARK-17695 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Scala 2.11.7 >Reporter: Jonathan Simozar > > When using the {{DataFrameReader}} method {{json}} on the JSON > {noformat}{"field1":{},"field2":"a"}{noformat} > {{field1}} is removed at deserialization. > This can be reproduced in the example below. > {code:java}// create spark context > val sc: SparkContext = new SparkContext("local[*]", "My App") > // create spark session > val sparkSession: SparkSession = > SparkSession.builder().config(sc.getConf).getOrCreate() > // create rdd > val strings = sc.parallelize(Seq( > """{"field1":{},"field2":"a"}""" > )) > // create json DataSet[Row], convert back to RDD, and print lines to stdout > sparkSession.read.json(strings) > .toJSON.collect().foreach(println) > {code} > *stdout* > {noformat} > {"field2":"a"} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17695) Deserialization error when using DataFrameReader.json on JSON line that contains an empty JSON object
[ https://issues.apache.org/jira/browse/SPARK-17695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527900#comment-15527900 ] Hyukjin Kwon commented on SPARK-17695: -- Related with SPARK-8093. It seems empty struct type is not allowed. I guess it'd work if {{field1}} is not empty. > Deserialization error when using DataFrameReader.json on JSON line that > contains an empty JSON object > - > > Key: SPARK-17695 > URL: https://issues.apache.org/jira/browse/SPARK-17695 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Scala 2.11.7 >Reporter: Jonathan Simozar > > When using the {{DataFrameReader}} method {{json}} on the JSON > {noformat}{"field1":{},"field2":"a"}{noformat} > {{field1}} is removed at deserialization. > This can be reproduced in the example below. > {code:java}// create spark context > val sc: SparkContext = new SparkContext("local[*]", "My App") > // create spark session > val sparkSession: SparkSession = > SparkSession.builder().config(sc.getConf).getOrCreate() > // create rdd > val strings = sc.parallelize(Seq( > """{"field1":{},"field2":"a"}""" > )) > // create json DataSet[Row], convert back to RDD, and print lines to stdout > sparkSession.read.json(strings) > .toJSON.collect().foreach(println) > {code} > *stdout* > {noformat} > {"field2":"a"} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org