[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22812 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228391626 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2384,14 +2384,23 @@ class Analyzer( case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { case ArrayType(et, cn) => - val expr = MapObjects(func, inputData, et, cn, cls) transformUp { + MapObjects(func, inputData, et, cn, cls) transformUp { case UnresolvedExtractValue(child, fieldName) if child.resolved => ExtractValue(child, fieldName, resolver) } - expr case other => throw new AnalysisException("need an array field but got " + other.catalogString) } +case u: UnresolvedCatalystToExternalMap if u.child.resolved => + u.child.dataType match { +case _: MapType => + CatalystToExternalMap(u) transformUp { +case UnresolvedExtractValue(child, fieldName) if child.resolved => --- End diff -- TBH I don't quite remember why I did this for `MapObjects`, so I just follow it here. Maybe we can remove it in a followup PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228390150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2384,14 +2384,23 @@ class Analyzer( case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { case ArrayType(et, cn) => - val expr = MapObjects(func, inputData, et, cn, cls) transformUp { + MapObjects(func, inputData, et, cn, cls) transformUp { case UnresolvedExtractValue(child, fieldName) if child.resolved => ExtractValue(child, fieldName, resolver) } - expr case other => throw new AnalysisException("need an array field but got " + other.catalogString) } +case u: UnresolvedCatalystToExternalMap if u.child.resolved => + u.child.dataType match { +case _: MapType => + CatalystToExternalMap(u) transformUp { +case UnresolvedExtractValue(child, fieldName) if child.resolved => --- End diff -- `ResolveReferences` might also process that, but it is also good to have them here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228388671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2384,14 +2384,23 @@ class Analyzer( case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { case ArrayType(et, cn) => - val expr = MapObjects(func, inputData, et, cn, cls) transformUp { + MapObjects(func, inputData, et, cn, cls) transformUp { case UnresolvedExtractValue(child, fieldName) if child.resolved => ExtractValue(child, fieldName, resolver) } - expr case other => throw new AnalysisException("need an array field but got " + other.catalogString) } +case u: UnresolvedCatalystToExternalMap if u.child.resolved => + u.child.dataType match { +case _: MapType => + CatalystToExternalMap(u) transformUp { +case UnresolvedExtractValue(child, fieldName) if child.resolved => --- End diff -- Yea I think so. The `UnresolvedExtractValue` might appear in `CatalystToExternalMap.keyLambdaFunction` and `valueLambdaFunction` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228381728 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2384,14 +2384,23 @@ class Analyzer( case UnresolvedMapObjects(func, inputData, cls) if inputData.resolved => inputData.dataType match { case ArrayType(et, cn) => - val expr = MapObjects(func, inputData, et, cn, cls) transformUp { + MapObjects(func, inputData, et, cn, cls) transformUp { case UnresolvedExtractValue(child, fieldName) if child.resolved => ExtractValue(child, fieldName, resolver) } - expr case other => throw new AnalysisException("need an array field but got " + other.catalogString) } +case u: UnresolvedCatalystToExternalMap if u.child.resolved => + u.child.dataType match { +case _: MapType => + CatalystToExternalMap(u) transformUp { +case UnresolvedExtractValue(child, fieldName) if child.resolved => --- End diff -- When `u.child` is resolved, is there still `UnresolvedExtractValue`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228380982 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -216,7 +215,6 @@ case class ExpressionEncoder[T]( } nullSafeSerializer match { case If(_: IsNull, _, s: CreateNamedStruct) => s -case s: CreateNamedStruct => s --- End diff -- Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r228195016 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -119,10 +119,9 @@ object ExpressionEncoder { } val childrenDeserializers = encoders.zipWithIndex.map { case (enc, index) => - val getColumnsByOrdinals = enc.objDeserializer.collect { case c: GetColumnByOrdinal => c } -.distinct - assert(getColumnsByOrdinals.size == 1, "object deserializer should have only one " + -s"`GetColumnByOrdinal`, but there are ${getColumnsByOrdinals.size}") + val getColExprs = enc.objDeserializer.collect { case c: GetColumnByOrdinal => c }.distinct --- End diff -- unrelated but to fix minor code style issues in https://github.com/apache/spark/pull/22749 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r227638941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1837,8 +1837,6 @@ case class GetArrayFromMap private( arrayGetter: MapData => ArrayData, elementTypeGetter: MapType => DataType) extends UnaryExpression with NonSQLExpression { - private lazy val encodedFunctionName: String = TermName(functionName).encodedName.toString --- End diff -- this is to address https://github.com/apache/spark/pull/22745#discussion_r227407344 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22812#discussion_r227638902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1090,15 +1096,9 @@ case class CatalystToExternalMap private( val tupleLoopValue = ctx.freshName("tupleLoopValue") val builderValue = ctx.freshName("builderValue") -val getLength = s"${genInputData.value}.numElements()" --- End diff -- these are unrelated, but is a followup of https://github.com/apache/spark/pull/16986 to address the remaining comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22812 [SPARK-25817][SQL] Dataset encoder should support combination of map and product type ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/22745 , Dataset encoder supports the combination of java bean and map type. This PR is to fix the Scala side. The reason why it didn't work before is, `CatalystToExternalMap` tries to get the data type of the input map expression, while it can be unresolved and its data type is known. To fix it, we can follow `UnresolvedMapObjects`, to create a `UnresolvedCatalystToExternalMap`, and only create `CatalystToExternalMap` when the input map expression is resolved and the data type is known. ## How was this patch tested? enable a old test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark map Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22812 commit da31d2602b8e12eb8949336cf14b903c0df731cf Author: Wenchen Fan Date: 2018-10-24T04:21:44Z Dataset encoder should support combination of map and product type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org