[ https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523358#comment-17523358 ]
Sean R. Owen commented on SPARK-38826: -------------------------------------- Yeah I guess we can at least update the docs; I'm unclear on whether the behavior is wrong or right > dropFieldIfAllNull option does not work for empty JSON struct > ------------------------------------------------------------- > > Key: SPARK-38826 > URL: https://issues.apache.org/jira/browse/SPARK-38826 > Project: Spark > Issue Type: Improvement > Components: Documentation > Affects Versions: 3.2.1 > Reporter: morvenhuang > Priority: Trivial > > As stated in the doc, > {quote}dropFieldIfAllNull > Whether to ignore column of all null values or empty array/struct during > schema inference. > > {quote} > But when I try this, > > {code:java} > String json = "{\"field1\":\"value1\", \"field2\":{}}"; > JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); > JavaRDD<String> jrdd = jsc.parallelize(Arrays.asList(json)); > Dataset<Row> df = spark.read().option("dropFieldIfAllNull", > "false").json(jrdd); > df.printSchema(); > {code} > > I get this, > {code:java} > root > |-- field1: string (nullable = true){code} > Notice field2 is still missing even when dropFieldIfAllNull is set to false, > so apparently, this option does not work for empty struct. > This is due to SPARK-8093, the empty struct will be dropped anyway. > I think we should update the doc, otherwise it would be confusing. > I can make a patch for this. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org