[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523358#comment-17523358 ] Sean R. Owen commented on SPARK-38826: -- Yeah I guess we can at least update the docs; I'm unclear on whether the behavior is wrong or right > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523270#comment-17523270 ] morvenhuang commented on SPARK-38826: - [~srowen] Sean, thanks for the comment, :). It's a bit tricky to me, it seems not good idea to leave empty struct there since it'll cause problem when writing it to file (QueryCompilationErrors#writeEmptySchemasUnsupportedByDataSourceError), but if we just drop the field like we do now, it causes shema change, so any suggestion which way to go, 1) update the documentation, 2) update the code to allow empty struct, 3) not a problem? > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523261#comment-17523261 ] Sean R. Owen commented on SPARK-38826: -- Ah right it was there inline in the code. I think there is something wrong, yes, because field2 doesn't appear at all. I assume this is not considered effectively 'empty' by JSON just because it maps to an empty dict. I wouldn't imagine it is 'null' to begin with. But I don't know how to diagnose this in the code > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523251#comment-17523251 ] morvenhuang commented on SPARK-38826: - [~srowen] , Hi, Sean, here is the JSON I used in my code snippet above, {code:java} {"field1":"value1", "field2":{}}{code} > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523187#comment-17523187 ] Sean R. Owen commented on SPARK-38826: -- Wait, can you show the JSON? it's not clear if this is the right or wrong behavior > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct
[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519285#comment-17519285 ] Apache Spark commented on SPARK-38826: -- User 'morvenhuang' has created a pull request for this issue: https://github.com/apache/spark/pull/36111 > dropFieldIfAllNull option does not work for empty JSON struct > - > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.2.1 >Reporter: morvenhuang >Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org