[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-17 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523358#comment-17523358
 ] 

Sean R. Owen commented on SPARK-38826:
--

Yeah I guess we can at least update the docs; I'm unclear on whether the 
behavior is wrong or right

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-16 Thread morvenhuang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523270#comment-17523270
 ] 

morvenhuang commented on SPARK-38826:
-

[~srowen] Sean, thanks for the comment, :). It's a bit tricky to me, it seems 
not good idea to leave empty struct there since it'll cause problem when 
writing it to file 
(QueryCompilationErrors#writeEmptySchemasUnsupportedByDataSourceError), but if 
we just drop the field like we do now, it causes shema change, so any 
suggestion which way to go, 1) update the documentation, 2) update the code to 
allow empty struct, 3) not a problem?

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-16 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523261#comment-17523261
 ] 

Sean R. Owen commented on SPARK-38826:
--

Ah right it was there inline in the code. I think there is something wrong, 
yes, because field2 doesn't appear at all. I assume this is not considered 
effectively 'empty' by JSON just because it maps to an empty dict. I wouldn't 
imagine it is 'null' to begin with. But I don't know how to diagnose this in 
the code

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-16 Thread morvenhuang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523251#comment-17523251
 ] 

morvenhuang commented on SPARK-38826:
-

[~srowen] , Hi, Sean, here is the JSON I used in my code snippet above, 
{code:java}
{"field1":"value1", "field2":{}}{code}
 

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-16 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523187#comment-17523187
 ] 

Sean R. Owen commented on SPARK-38826:
--

Wait, can you show the JSON? it's not clear if this is the right or wrong 
behavior

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519285#comment-17519285
 ] 

Apache Spark commented on SPARK-38826:
--

User 'morvenhuang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36111

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org