[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22812


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228391626
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2384,14 +2384,23 @@ class Analyzer(
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
 case ArrayType(et, cn) =>
-  val expr = MapObjects(func, inputData, et, cn, cls) 
transformUp {
+  MapObjects(func, inputData, et, cn, cls) transformUp {
 case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
   ExtractValue(child, fieldName, resolver)
   }
-  expr
 case other =>
   throw new AnalysisException("need an array field but got 
" + other.catalogString)
   }
+case u: UnresolvedCatalystToExternalMap if u.child.resolved =>
+  u.child.dataType match {
+case _: MapType =>
+  CatalystToExternalMap(u) transformUp {
+case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
--- End diff --

TBH I don't quite remember why I did this for `MapObjects`, so I just 
follow it here. Maybe we can remove it in a followup PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228390150
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2384,14 +2384,23 @@ class Analyzer(
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
 case ArrayType(et, cn) =>
-  val expr = MapObjects(func, inputData, et, cn, cls) 
transformUp {
+  MapObjects(func, inputData, et, cn, cls) transformUp {
 case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
   ExtractValue(child, fieldName, resolver)
   }
-  expr
 case other =>
   throw new AnalysisException("need an array field but got 
" + other.catalogString)
   }
+case u: UnresolvedCatalystToExternalMap if u.child.resolved =>
+  u.child.dataType match {
+case _: MapType =>
+  CatalystToExternalMap(u) transformUp {
+case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
--- End diff --

`ResolveReferences` might also process that, but it is also good to have 
them here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228388671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2384,14 +2384,23 @@ class Analyzer(
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
 case ArrayType(et, cn) =>
-  val expr = MapObjects(func, inputData, et, cn, cls) 
transformUp {
+  MapObjects(func, inputData, et, cn, cls) transformUp {
 case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
   ExtractValue(child, fieldName, resolver)
   }
-  expr
 case other =>
   throw new AnalysisException("need an array field but got 
" + other.catalogString)
   }
+case u: UnresolvedCatalystToExternalMap if u.child.resolved =>
+  u.child.dataType match {
+case _: MapType =>
+  CatalystToExternalMap(u) transformUp {
+case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
--- End diff --

Yea I think so. The `UnresolvedExtractValue` might appear in 
`CatalystToExternalMap.keyLambdaFunction` and `valueLambdaFunction`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228381728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2384,14 +2384,23 @@ class Analyzer(
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
 case ArrayType(et, cn) =>
-  val expr = MapObjects(func, inputData, et, cn, cls) 
transformUp {
+  MapObjects(func, inputData, et, cn, cls) transformUp {
 case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
   ExtractValue(child, fieldName, resolver)
   }
-  expr
 case other =>
   throw new AnalysisException("need an array field but got 
" + other.catalogString)
   }
+case u: UnresolvedCatalystToExternalMap if u.child.resolved =>
+  u.child.dataType match {
+case _: MapType =>
+  CatalystToExternalMap(u) transformUp {
+case UnresolvedExtractValue(child, fieldName) if 
child.resolved =>
--- End diff --

When `u.child` is resolved, is there still `UnresolvedExtractValue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228380982
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -216,7 +215,6 @@ case class ExpressionEncoder[T](
   }
   nullSafeSerializer match {
 case If(_: IsNull, _, s: CreateNamedStruct) => s
-case s: CreateNamedStruct => s
--- End diff --

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r228195016
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -119,10 +119,9 @@ object ExpressionEncoder {
 }
 
 val childrenDeserializers = encoders.zipWithIndex.map { case (enc, 
index) =>
-  val getColumnsByOrdinals = enc.objDeserializer.collect { case c: 
GetColumnByOrdinal => c }
-.distinct
-  assert(getColumnsByOrdinals.size == 1, "object deserializer should 
have only one " +
-s"`GetColumnByOrdinal`, but there are 
${getColumnsByOrdinals.size}")
+  val getColExprs = enc.objDeserializer.collect { case c: 
GetColumnByOrdinal => c }.distinct
--- End diff --

unrelated but to fix minor code style issues in 
https://github.com/apache/spark/pull/22749


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r227638941
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1837,8 +1837,6 @@ case class GetArrayFromMap private(
 arrayGetter: MapData => ArrayData,
 elementTypeGetter: MapType => DataType) extends UnaryExpression with 
NonSQLExpression {
 
-  private lazy val encodedFunctionName: String = 
TermName(functionName).encodedName.toString
--- End diff --

this is to address 
https://github.com/apache/spark/pull/22745#discussion_r227407344


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22812#discussion_r227638902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1090,15 +1096,9 @@ case class CatalystToExternalMap private(
 val tupleLoopValue = ctx.freshName("tupleLoopValue")
 val builderValue = ctx.freshName("builderValue")
 
-val getLength = s"${genInputData.value}.numElements()"
--- End diff --

these are unrelated, but is a followup of 
https://github.com/apache/spark/pull/16986 to address the remaining comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-23 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/22812

[SPARK-25817][SQL] Dataset encoder should support combination of map and 
product type

## What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/22745 , Dataset encoder supports 
the combination of java bean and map type. This PR is to fix the Scala side.

The reason why it didn't work before is, `CatalystToExternalMap` tries to 
get the data type of the input map expression, while it can be unresolved and 
its data type is known. To fix it, we can follow `UnresolvedMapObjects`, to 
create a `UnresolvedCatalystToExternalMap`, and only create 
`CatalystToExternalMap` when the input map expression is resolved and the data 
type is known.

## How was this patch tested?

enable a old test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark map

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22812


commit da31d2602b8e12eb8949336cf14b903c0df731cf
Author: Wenchen Fan 
Date:   2018-10-24T04:21:44Z

Dataset encoder should support combination of map and product type




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org