[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user makagonov closed the pull request at: https://github.com/apache/spark/pull/20884 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20884#discussion_r176897510 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1229,7 +1229,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { val df2 = df1.toDF val result = df2.toJSON.collect() // scalastyle:off -assert(result(0) === "{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" null\"]}") +assert(result(0) === "{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" null\"],\"f5\":null}") --- End diff -- If we go the current way, it'd write out every `null` with every field: ```json {"a":null,"b":null,"c":null} {"a":null,"b":null,"c":1} {"a":1,"b":null,"c":1} {"a":1,"b":2,"c":3} ``` which I think's quit inefficient. Does that fix actually use case to be clear? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20884#discussion_r176897268 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite { val gen = new JacksonGenerator(dataType, writer, option) gen.write(input) gen.flush() -assert(writer.toString === """[{}]""") +assert(writer.toString === """[{"a":null}]""") --- End diff -- I think you should compare this: ```scala scala> sql(""" select array(cast(null as struct)) as my_array""").toJSON.collect().foreach(println) {"my_array":[null]} scala> sql(""" select array(struct(cast(null as string))) as my_array""").toJSON.collect().foreach(println) {"my_array":[{}]} ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user makagonov commented on a diff in the pull request: https://github.com/apache/spark/pull/20884#discussion_r176842732 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite { val gen = new JacksonGenerator(dataType, writer, option) gen.write(input) gen.flush() -assert(writer.toString === """[{}]""") +assert(writer.toString === """[{"a":null}]""") --- End diff -- @HyukjinKwon actually, it looks like the result should be `[null]` rather than `[{}]`. Look at the following repro from spark-shell (downloaded binaries): ```scala scala> val df = sqlContext.sql(""" select array(cast(null as struct)) as my_array""") df: org.apache.spark.sql.DataFrame = [my_array: array>] scala> df.printSchema root |-- my_array: array (nullable = false) ||-- element: struct (containsNull = true) |||-- k: string (nullable = true) scala> df.toJSON.collect().foreach(println) {"my_array":[null]} scala> df.select(to_json($"my_array")).collect().foreach(x => println(x(0))) [null] ``` In older version of `JacksonGenerator`, we had a filter by element value, and if it was `null`, `gen.writeNull()` was called no matter what the type was ([old implementation](https://github.com/apache/spark/blob/3258f27a881dfeb5ab8bae90c338603fa4b6f9d8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41)). But currently, we're calling `gen.writeStartObject()...gen.writeEndObject()` no matter if the value is null. I couldn't repro this with a query, but when `StructsToJson` is called from this unit test, it goes through `JacksonGenerator.arrElementWriter` which has lines ```scala case st: StructType => (arr: SpecializedGetters, i: Int) => { writeObject(writeFields(arr.getStruct(i, st.length), st, rootFieldWriters)) } ``` that makes it print json object even there is `null`. I'll look into this later and will try to find the easy workaround. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/20884#discussion_r176796998 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite { val gen = new JacksonGenerator(dataType, writer, option) gen.write(input) gen.flush() -assert(writer.toString === """[{}]""") +assert(writer.toString === """[{"a":null}]""") --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20884#discussion_r176715906 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite { val gen = new JacksonGenerator(dataType, writer, option) gen.write(input) gen.flush() -assert(writer.toString === """[{}]""") +assert(writer.toString === """[{"a":null}]""") --- End diff -- I think previous result was a valid test case .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...
GitHub user makagonov opened a pull request: https://github.com/apache/spark/pull/20884 [SPARK-23773][SQL] JacksonGenerator does not include keys that have null value for StructTypes ## What changes were proposed in this pull request? As stated in Jira, when `toJSON` is called on a dataset, the result JSON string will not have keys displayed for `StructType`s that have null value. This PR fixes the issue and writes field with "null" value. ## How was this patch tested? Added a unit test to `JsonSuite.scala` You can merge this pull request into a Git repository by running: $ git pull https://github.com/makagonov/spark SPARK-23773 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20884.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20884 commit 9faf8533d044bd667bac6fb1925f2d38c4d281d4 Author: Sergey Makagonov Date: 2018-03-22T19:38:44Z [SPARK-23773][SQL] JacksonGenerator does not include keys that have null value for StructTypes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org