[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-26 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/21439

[SPARK-24391][SQL] Support arrays of any types by from_json

## What changes were proposed in this pull request?

The PR removes a restriction for element types of array type which exists 
in `from_json` for the root type. Currently, the function can handle only 
arrays of structs. Even array of primitive types is disallowed. The PR allows 
arrays of any types currently supported by JSON datasource. Here is an example 
of an array of a primitive type:

```
scala> import org.apache.spark.sql.functions._
scala> val df = Seq("[1, 2, 3]").toDF("a")
scala> val schema = new ArrayType(IntegerType, false)
scala> val arr = df.select(from_json($"a", schema))
scala> arr.printSchema
root
 |-- jsontostructs(a): array (nullable = true)
 ||-- element: integer (containsNull = true)
```
and result of converting of the json string to the `ArrayType`:
```
scala> arr.show
++
|jsontostructs(a)|
++
|   [1, 2, 3]|
++
```

## How was this patch tested?

I added a few positive and negative tests:
- array of primitive types
- array of arrays
- array of structs
- array of maps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 from_json-array

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21439


commit 3a7559b8809757ba32491a9c882ec40c8986c3b0
Author: Maxim Gekk 
Date:   2018-05-26T16:46:08Z

Support arrays by from_json

commit b601a9365e12744c408a8c198a94a5c6a9e4607e
Author: Maxim Gekk 
Date:   2018-05-26T17:08:49Z

Fix comments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191298844
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

private? (This is not related to this pr though, `nullableSchema` also can 
be private?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191298921
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

Can you make the option `unpackArray` case-insensitive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r191299100
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,8 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  val unpackArray: Boolean = 
options.get("unpackArray").map(_.toBoolean).getOrElse(false)
--- End diff --

If we add this new option here, I feel we'd be better to document somewhere 
(e.g., `sq/functions.scala`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204226404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -61,6 +61,7 @@ class JacksonParser(
 dt match {
   case st: StructType => makeStructRootConverter(st)
   case mt: MapType => makeMapRootConverter(mt)
+  case at: ArrayType => makeArrayRootConverter(at)
--- End diff --

This change accepts the json datasource form that the master can't parse? 
If so, I think we need tests in `JsonSuite`, too. cc: @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204231416
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -61,6 +61,7 @@ class JacksonParser(
 dt match {
   case st: StructType => makeStructRootConverter(st)
   case mt: MapType => makeMapRootConverter(mt)
+  case at: ArrayType => makeArrayRootConverter(at)
--- End diff --

> This change accepts the json datasource form that the master can't parse?

Right, it accept arrays of any types comparing to the master which accepts 
arrays of structs only

> If so, I think we need tests in JsonSuite ...

I added a few tests to `JsonSuite` or do you mean some concrete test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204231723
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -61,6 +61,7 @@ class JacksonParser(
 dt match {
   case st: StructType => makeStructRootConverter(st)
   case mt: MapType => makeMapRootConverter(mt)
+  case at: ArrayType => makeArrayRootConverter(at)
--- End diff --

You've already added tests in `JsonSuite`? It seems there are tests in 
`JsonFunctionsSuite`, `JsonExpressionSuite`, and `SQLQueryTestSuite` now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204232671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -61,6 +61,7 @@ class JacksonParser(
 dt match {
   case st: StructType => makeStructRootConverter(st)
   case mt: MapType => makeMapRootConverter(mt)
+  case at: ArrayType => makeArrayRootConverter(at)
--- End diff --

Sorry, I didn't catch you meant specific class. Just in case, what is the 
reason for adding tests to `JsonSuite`? I changed behavior of a function, so, 
`JsonFunctionsSuite` is perfect place for new tests, I believe. At the moment, 
you cannot specify a schema of json objects different from `StructType` in 
`DataFrameReader`. In this way, the `ArrayType` can come into `JacksonParser` 
only from json functions (`from_json`). I can add similar test to `JsonSuite` 
as in `JsonFunctionsSuite` but it doesn't make any sense from my point of view.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204233568
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -423,7 +423,9 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with
 val input = """{"a": 1}"""
 val schema = ArrayType(StructType(StructField("a", IntegerType) :: 
Nil))
 val output = InternalRow(1) :: Nil
-checkEvaluation(JsonToStructs(schema, Map.empty, Literal(input), 
gmtId), output)
+checkEvaluation(
+  JsonToStructs(schema, Map.empty, Literal(input), gmtId),
+  output)
--- End diff --

Looks unrelated change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204233840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -61,6 +61,7 @@ class JacksonParser(
 dt match {
   case st: StructType => makeStructRootConverter(st)
   case mt: MapType => makeMapRootConverter(mt)
+  case at: ArrayType => makeArrayRootConverter(at)
--- End diff --

ah, ok. You touched the the `JacksonParser.scala` file, so I though there 
were some behaivour changes in the json datasorce. But I notice that there are 
not, so the current tests are enough. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204248469
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

why do we need this? This means that if the user asks for an array of data 
and the data doesn't contain an array we return an array with a single element. 
This seems wrong to me. I'd rather return null or throw an exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204250446
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

This is for backward compatibility to the behavior introduced by this PR: 
https://github.com/apache/spark/pull/16929,  most likely by this: 
https://github.com/apache/spark/pull/16929/files#diff-6626026091295ad8c0dfb66ecbcd04b1R506
 . Even special test was added: 
https://github.com/apache/spark/pull/16929/files#diff-88230f171af0b7a40791a867f9dd3a36R382
 . Please, ask author @HyukjinKwon about the it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204263502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

If an array of data is empty, and the schema is an array, it should return 
an empty array, 
https://github.com/apache/spark/pull/16929/files#diff-88230f171af0b7a40791a867f9dd3a36R389


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204274790
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

I think it is a super weird case...can we put this special handling code 
for back-compatibility in `JsonToStructs`?
Ether way, I think we should leave code comments here to make others 
understood easily?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204300575
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

I think what was done in that PR is pretty different and it is actually the 
opposite of what we are doing here. Indeed, there we are returning an array of 
structs when a struct is specified as schema and the JSON contains an array. 
Here we are returning an array with one struct when the schema is an array of 
struct and there is a struct instead of an array.

Despite I don't really like the behavior introduced in the PR you 
mentioned, I can understand it, as it was a way to support array of struct (the 
only at the moment) and I don't think we can change it before 3.0 at least for 
backward compatibility. But since here we are introducing a new behavior, if an 
array is required and a struct is found, I think returning an array with one 
element is a wrong/unexpected behavior and returning null would be what I'd 
expect as a user.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-23 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204553995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

I can only say that this `case START_OBJECT` was added to handle the case 
when an user specified a schema as an `array` of `struct`, and a `struct` is 
found in the input json. See the existing test case which I pointed out above: 
https://github.com/apache/spark/pull/16929/files#diff-88230f171af0b7a40791a867f9dd3a36R382
 . I don't want to change the behavior in the PR and potentially break user's 
apps. What I would propose is to put the functionality under a 
`spark.sql.legacy.*` flag which could be deleted in Spark 3.0.

@maropu Initially I put the behavior under a flag in `JsonToStructs` but 
Reynold asked me to support new behavior without any new sticks/flags.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204915146
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

Shall we add a comment on top of this `case` to explain it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204931175
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -136,12 +136,11 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   test("from_json invalid schema") {
--- End diff --

Not a invalid schema now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204932903
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -544,34 +544,27 @@ case class JsonToStructs(
   timeZoneId = None)
 
   override def checkInputDataTypes(): TypeCheckResult = nullableSchema 
match {
-case _: StructType | ArrayType(_: StructType, _) | _: MapType =>
+case _: StructType | _: ArrayType | _: MapType =>
   super.checkInputDataTypes()
 case _ => TypeCheckResult.TypeCheckFailure(
   s"Input schema ${nullableSchema.catalogString} must be a struct or 
an array of structs.")
--- End diff --

`or an array of structs.` -> `or an array.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r204933936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -544,34 +544,27 @@ case class JsonToStructs(
   timeZoneId = None)
--- End diff --

Pleas also update the comment of `JsonToStructs`:

`Converts an json input string to a [[StructType]] or [[ArrayType]] of 
[[StructType]]s`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-07-28 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r205953494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,17 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
--- End diff --

I am adding a comment for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192465130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -548,7 +553,9 @@ case class JsonToStructs(
   forceNullableSchema = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA))
 
   override def checkInputDataTypes(): TypeCheckResult = nullableSchema 
match {
-case _: StructType | ArrayType(_: StructType, _) | _: MapType =>
+case ArrayType(_: StructType, _) if unpackArray =>
--- End diff --

Even if `unpackArray` is `false`, the next branch in line 558 still do 
`super.checkInputDataTypes()` for any `ArrayType`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192488365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,13 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
--- End diff --

In line 87:
```
val array = convertArray(parser, elementConverter)
// Here, as we support reading top level JSON arrays and take every 
element
// in such an array as a row, this case is possible.
if (array.numElements() == 0) {
  Nil
} else {
  array.toArray[InternalRow](schema).toSeq
}
```
Should we also follow this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192501912
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,11 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  private val caseInsensitiveOptions = CaseInsensitiveMap(options)
+  private val unpackArray: Boolean = {
--- End diff --

Why do we need this? Can you add comments about it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192502051
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -423,7 +423,9 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with
 val input = """{"a": 1}"""
 val schema = ArrayType(StructType(StructField("a", IntegerType) :: 
Nil))
 val output = InternalRow(1) :: Nil
-checkEvaluation(JsonToStructs(schema, Map.empty, Literal(input), 
gmtId, true), output)
+checkEvaluation(
+  JsonToStructs(schema, Map("unpackArray" -> "true"), Literal(input), 
gmtId, true),
--- End diff --

add case for `unpackArray` as `false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r194491865
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,13 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
--- End diff --

The code in line 87 returns `null` for json input `[]` if schema is 
`StructType(StructField("a", IntegerType) :: Nil)`. I would explain why we 
should return `null` in that case: we *extract* struct from the array. If the 
array is _empty_, it means there is nothing to extract and we returns `null` 
for the nothing.

In case when schema is `ArrayType(...)`,  I believe we should return 
`empty` array for empty JSON array `[]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r209461334
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,21 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
+  case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
+// This handles the case when an input JSON object is a structure 
but
+// the specified schema is an array of structures. In that case, 
the input JSON is
--- End diff --

Could you add an example here, like what we did in `makeStructRootConverter 
`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r209461516
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql 
---
@@ -39,3 +39,8 @@ select from_json('{"a":1, "b":"2"}', 
'struct');
 -- infer schema of json literal
 select schema_of_json('{"c1":0, "c2":[1]}');
 select from_json('{"c1":[1, 2, 3]}', schema_of_json('{"c1":[0]}'));
+
+-- from_json - array type
+select from_json('[1, 2, 3]', 'array');
--- End diff --

Add more cases ?
select from_json('[3, null, 4]', 'array')
select from_json('[3, "str", 4]', 'array')


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r209468355
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql 
---
@@ -39,3 +39,8 @@ select from_json('{"a":1, "b":"2"}', 
'struct');
 -- infer schema of json literal
 select schema_of_json('{"c1":0, "c2":[1]}');
 select from_json('{"c1":[1, 2, 3]}', schema_of_json('{"c1":[0]}'));
+
+-- from_json - array type
+select from_json('[1, 2, 3]', 'array');
--- End diff --

added


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21439


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org