[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-06-29 Thread makagonov
Github user makagonov closed the pull request at:

https://github.com/apache/spark/pull/20884


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-23 Thread makagonov
Github user makagonov commented on a diff in the pull request:

https://github.com/apache/spark/pull/20884#discussion_r176842732
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite {
 val gen = new JacksonGenerator(dataType, writer, option)
 gen.write(input)
 gen.flush()
-assert(writer.toString === """[{}]""")
+assert(writer.toString === """[{"a":null}]""")
--- End diff --

@HyukjinKwon actually, it looks like the result should be `[null]` rather 
than `[{}]`.
Look at the following repro from spark-shell (downloaded binaries):
```scala
scala> val df = sqlContext.sql(""" select array(cast(null as 
struct)) as my_array""")
df: org.apache.spark.sql.DataFrame = [my_array: array>]

scala> df.printSchema
root
 |-- my_array: array (nullable = false)
 ||-- element: struct (containsNull = true)
 |||-- k: string (nullable = true)
scala> df.toJSON.collect().foreach(println)
{"my_array":[null]}
scala> df.select(to_json($"my_array")).collect().foreach(x => println(x(0)))
[null]
```

In older version of `JacksonGenerator`, we had a filter by element value, 
and if it was `null`, `gen.writeNull()` was called no matter what the type was 
([old 
implementation](https://github.com/apache/spark/blob/3258f27a881dfeb5ab8bae90c338603fa4b6f9d8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41)).
 But currently, we're calling `gen.writeStartObject()...gen.writeEndObject()` 
no matter if the value is null.

I couldn't repro this with a query, but when `StructsToJson` is called from 
this unit test, it goes through `JacksonGenerator.arrElementWriter` which has 
lines
```scala
case st: StructType =>
(arr: SpecializedGetters, i: Int) => {
  writeObject(writeFields(arr.getStruct(i, st.length), st, 
rootFieldWriters))
}
```
that makes it print json object even there is `null`.

I'll look into this later and will try to find the easy workaround.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20884: [SPARK-23773][SQL] JacksonGenerator does not incl...

2018-03-22 Thread makagonov
GitHub user makagonov opened a pull request:

https://github.com/apache/spark/pull/20884

[SPARK-23773][SQL] JacksonGenerator does not include keys that have null 
value for StructTypes

## What changes were proposed in this pull request?
As stated in Jira, when `toJSON` is called on a dataset, the result JSON 
string will not have keys displayed for `StructType`s that have null value. 
This PR fixes the issue and writes field with "null" value.

## How was this patch tested?
Added a unit test to `JsonSuite.scala`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/makagonov/spark SPARK-23773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20884.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20884


commit 9faf8533d044bd667bac6fb1925f2d38c4d281d4
Author: Sergey Makagonov 
Date:   2018-03-22T19:38:44Z

[SPARK-23773][SQL] JacksonGenerator does not include keys that have null 
value for StructTypes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org