[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132875524
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -640,6 +644,7 @@ case class StructsToJson(
   lazy val rowSchema = child.dataType match {
 case st: StructType => st
 case ArrayType(st: StructType, _) => st
+case MapType(kt: DataType, st: StructType, _) => st
--- End diff --

little nit: `kt` -> `_`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132875639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -661,13 +666,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Looks we can use this `MapType` directly for `mapType` in 
`gen.write(map.asInstanceOf[MapData], mapType)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132875370
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -202,5 +203,9 @@ private[sql] class JacksonGenerator(
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  def write(map: MapData, mapType: MapType): Unit = {
--- End diff --

I am less sure if this `write` should take `mapType`. Looks equivalent 
`write(row: InternalRow)` does not take the struct type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132875025
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -676,7 +687,8 @@ case class StructsToJson(
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
 case _ => TypeCheckResult.TypeCheckFailure(
-  s"Input type ${child.dataType.simpleString} must be a struct or 
array of structs.")
+  s"Input type ${child.dataType.simpleString} must be a struct, array 
of structs or " +
+  s"map with a structs value.")
--- End diff --

little nit: `s` looks not required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132615092
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -186,6 +186,18 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row("""[{"_1":1}]""") :: Nil)
   }
 
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

Actually, I'm not sure what answer is it but I got `[0,1]` using
```
scala> 
Seq(Tuple1(Tuple1(Map(Tuple1(1)->Tuple1(1).toDF("a").select(to_json($"a")).show()
++
|structstojson(a)|
++
|[{"_1":{"[0,1]":{...|
++
```
so I think this answer should be correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
--- End diff --

If we support `MapType` in general, I think `ArrayType(MapType)` will be 
allowed? We don't need to address it in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614534
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -590,4 +590,24 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   """{"t":"2015-12-31T16:00:00"}"""
 )
   }
+
+  test("SPARK-21513: to_json support map[string, struct] to json") {
+val schema = MapType(StringType, StructType(StructField("a", 
IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map("test" -> 
InternalRow(1))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
+  """{"test":{"a":1}}"""
+)
+  }
+
+  test("SPARK-21513: to_json support map[struct, struct] to json") {
+val schema = MapType(StructType(StructField("a", IntegerType) :: Nil),
+  StructType(StructField("b", IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map(InternalRow(1) -> 
InternalRow(2))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614524
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -590,4 +590,24 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   """{"t":"2015-12-31T16:00:00"}"""
 )
   }
+
+  test("SPARK-21513: to_json support map[string, struct] to json") {
+val schema = MapType(StringType, StructType(StructField("a", 
IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map("test" -> 
InternalRow(1))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input, gmtId),
--- End diff --

I think we don't need `gmtId` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614326
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -186,6 +186,18 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row("""[{"_1":1}]""") :: Nil)
   }
 
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

`[0,1]` is for the key `Tuple1(1)` ? I think it should be `[1]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614036
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3121,8 +3121,9 @@ object functions {
 to_json(e, options.asScala.toMap)
 
   /**
-   * Converts a column containing a `StructType` or `ArrayType` of 
`StructType`s into a JSON string
-   * with the specified schema. Throws an exception, in the case of an 
unsupported type.
+   * Converts a column containing a `StructType`, `ArrayType` of 
`StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614025
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3106,9 +3106,9 @@ object functions {
   }
 
   /**
-   * (Java-specific) Converts a column containing a `StructType` or 
`ArrayType` of `StructType`s
-   * into a JSON string with the specified schema. Throws an exception, in 
the case of an
-   * unsupported type.
+   * (Java-specific) Converts a column containing a `StructType`, 
`ArrayType` of `StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132614002
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3090,9 +3090,9 @@ object functions {
   }
 
   /**
-   * (Scala-specific) Converts a column containing a `StructType` or 
`ArrayType` of `StructType`s
-   * into a JSON string with the specified schema. Throws an exception, in 
the case of an
-   * unsupported type.
+   * (Scala-specific) Converts a column containing a `StructType`, 
`ArrayType` of `StructType`s
+   * or a `MapType` with a `StructType` Value into a JSON string with the 
specified schema.
--- End diff --

nit: Value -> value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132613913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I was thinking that the function name for `StructsToJson` might be similar 
word, but looks it is just `to_json`, there is no wording of struct. Yeah, if 
so, `StructsToJson` is not public API so I think it is fine to change it.

Because this is his first PR, I'd prefer less complexity for him. And to 
support an arbitrary Map type needs to change few more places and makes review 
process longer. It should be great if we can let this in and @goldmedal can 
work on another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132613088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I am quite sure we keep compatibility in the name although it might be best 
to not change if possible. I think we can also add `prettyName` to `to_json` 
for the column name concern too.

What do you think about renaming this in another minor PR or a followup 
just for less line diff, less complexity and for an easier review, considering 
this is his very first PR? I am also fine with doing this here if the opposite 
makes more sense to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132609949
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Is there no compatibility issue if we rename the expression?

I've looked around and I think it should not be difficult to support 
`MapType` like `StructType`, although we needs to change few more places.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r132088249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I see. Looks now we have assumption that `JacksonGenerator` only takes 
`StructType` now. I think expression name should be fine and I guess we are 
okay to rename it if it looks required (actually I did it to `StructsToJson` :) 
).

Would it be difficult to support `MapType` like `StructType`? I took a 
quick look and it looks quite a lot of places to fix though. Up to my 
understanding, we could load JSON both `MapType` and `StructType` too.

```
scala> spark.read.schema("a map").json(Seq("""{"a": {"a": 
2}}""").toDS).show()
+---+
|  a|
+---+
|Map(a -> 2)|
+---+


scala> spark.read.schema("a struct").json(Seq("""{"a": {"a": 
2}}""").toDS).show()
+---+
|  a|
+---+
|[2]|
+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131952135
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
   try {
 JacksonUtils.verifySchema(rowSchema)
 TypeCheckResult.TypeCheckSuccess
--- End diff --

@HyukjinKwon  yeah, you're right. I'll add some message about maptype.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131951764
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -202,5 +202,9 @@ private[sql] class JacksonGenerator(
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  def write(map: MapData, mapType: MapType): Unit = {
+writeObject(writeMapData(map, mapType, makeWriter(mapType.valueType)))
--- End diff --

@viirya  ok, got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131946027
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

yeah, just like @viirya said. This expression is named `StructsToJson`.
Another reason is that it use `JacksonGenerator` which will need a 
`rowSchema`  when initializing to generate JSON string. Therefore, it must be 
restricted to `StructType` here.
or can we create another expression to support arbitrary map type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131936731
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -202,5 +202,9 @@ private[sql] class JacksonGenerator(
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  def write(map: MapData, mapType: MapType): Unit = {
+writeObject(writeMapData(map, mapType, makeWriter(mapType.valueType)))
--- End diff --

@goldmedal Similar to `arrElementWriter`, let's create a `mapValueWriter` 
at the beginning of `JacksonGenerator`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131936404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

I think this depends how `MapType` is proposed to use here. If we treat a 
map like a struct similarly, we can support arbitrary map type. Then we need to 
explicitly clarify it in expression description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131932270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Oh, no. Similar to the ArrayType case, as this expression is named 
`StructsToJson`, I guess it is restricted to StructType? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131930238
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Yeah. Looks like `writeMapData` support writing out all kind of fields. I 
think we can remove this limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131903872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -638,6 +638,7 @@ case class StructsToJson(
   lazy val rowSchema = child.dataType match {
--- End diff --

Could we add an examples in `ExpressionDescription` above at 611L like 
https://github.com/goldmedal/spark/commit/0cdcf9114527a2c359c25e46fd6556b3855bfb28#diff-6626026091295ad8c0dfb66ecbcd04b1R605?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131923383
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -202,5 +202,9 @@ private[sql] class JacksonGenerator(
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  def write(map: MapData, mapType: MapType): Unit = {
+writeObject(writeMapData(map, mapType, makeWriter(mapType.valueType)))
--- End diff --

I think we could avoid making the writer by 
`makeWriter(mapType.valueType))` per each map write, by creating this once when 
this `JacksonGenerator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131904051
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
   try {
 JacksonUtils.verifySchema(rowSchema)
 TypeCheckResult.TypeCheckSuccess
--- End diff --

I guess we need to fix the exception message in 684L.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r131903650
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -659,13 +660,19 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: StructType, _: Boolean) =>
+(map: Any) =>
+  val mapType = child.dataType.asInstanceOf[MapType]
+  gen.write(map.asInstanceOf[MapData], mapType)
+  getAndReset()
 }
   }
 
   override def dataType: DataType = StringType
 
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
-case _: StructType | ArrayType(_: StructType, _) =>
+case _: StructType | ArrayType(_: StructType, _) |
+ MapType(_: DataType, _: StructType, _: Boolean) =>
--- End diff --

Hm.. is there any reason to allow `map<.., struct>` only?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-07 Thread goldmedal
GitHub user goldmedal opened a pull request:

https://github.com/apache/spark/pull/18875

[SPARK-21513][SQL] Allow UDF to_json support converting MapType to json

# What changes were proposed in this pull request?
UDF to_json only supports converting `StructType` or `ArrayType` of 
`StructType`s to a json output string now. 
According to the discussion of JIRA SPARK-21513, I allow to `to_json` 
support converting `MapType` to a json output string.

# How was this patch tested?
Adding unit test case.

cc @viirya @HyukjinKwon 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/goldmedal/spark SPARK-21513

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18875.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18875






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2