[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22030#discussion_r208457801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql]( * * {{{ * // Compute the sum of earnings for each year by course with each course as a separate column - * df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings") + * df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), lit("Java"))).sum($"earnings") + * }}} + * + * For pivoting by multiple columns, use the `struct` function to combine the columns and values: + * + * {{{ + * df + * .groupBy($"year") + * .pivot(struct($"course", $"training"), Seq(struct(lit("java"), lit("Experts" + * .agg(sum($"earnings")) * }}} * * @param pivotColumn the column to pivot. * @param values List of values that will be translated to columns in the output DataFrame. * @since 2.4.0 */ - def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { + def pivot(pivotColumn: Column, values: Seq[Column]): RelationalGroupedDataset = { --- End diff -- Here's nothing to argue with the analyzer or something. The first interface exposed is `pivot(String, Seq[Any]`. We better keep it similar to the original version if there isn't a big issue. What's the downside of allowing both by `pivot(Column, Seq[Any])`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22030#discussion_r208460101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql]( * * {{{ * // Compute the sum of earnings for each year by course with each course as a separate column - * df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings") + * df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), lit("Java"))).sum($"earnings") + * }}} + * + * For pivoting by multiple columns, use the `struct` function to combine the columns and values: + * + * {{{ + * df + * .groupBy($"year") + * .pivot(struct($"course", $"training"), Seq(struct(lit("java"), lit("Experts" + * .agg(sum($"earnings")) * }}} * * @param pivotColumn the column to pivot. * @param values List of values that will be translated to columns in the output DataFrame. * @since 2.4.0 */ - def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { + def pivot(pivotColumn: Column, values: Seq[Column]): RelationalGroupedDataset = { --- End diff -- > The previous interface pivot(Column, Seq[Any]) has existed for more then multiple years. Is this based on actual feedback from users or your speculation?\ This is what @MaxGekk added in https://github.com/apache/spark/pull/21699. > This assumption of yours is not true. See my reply to your comment below. No. Seq[Any] takes literal values (objects); Seq[Column] takes `Column` expressions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94406/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94406/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22022 Merged to branch 2.2, please close this PR @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21596 **[Test build #94350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94350/testReport)** for PR 21596 at commit [`c1f96d9`](https://github.com/apache/spark/commit/c1f96d9e310bda4cb5698f0350efdc1afdba8809). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21932 **[Test build #94351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94351/testReport)** for PR 21932 at commit [`9d12a9e`](https://github.com/apache/spark/commit/9d12a9ee7b3c0d6037c3c8d99642fdb45638f4f2). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` s\"its class is $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21258: [SPARK-23933][SQL] Add map_from_arrays function
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21258#discussion_r208199133 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -235,6 +235,69 @@ case class CreateMap(children: Seq[Expression]) extends Expression { override def prettyName: String = "map" } +/** + * Returns a catalyst Map containing the two arrays in children expressions as keys and values. + */ +@ExpressionDescription( + usage = """ +_FUNC_(keys, values) - Creates a map with a pair of the given key/value arrays. All elements + in keys should not be null""", + examples = """ +Examples: + > SELECT _FUNC_([1.0, 3.0], ['2', '4']); + {1.0:"2",3.0:"4"} + """, since = "2.4.0") +case class CreateMapFromArray(left: Expression, right: Expression) +extends BinaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (ArrayType(_, cn), ArrayType(_, _)) => +if (!cn) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure("All of the given keys should be non-null") +} + case _ => +TypeCheckResult.TypeCheckFailure("The given two arguments should be an array") +} + } + + override def dataType: DataType = { +MapType( + keyType = left.dataType.asInstanceOf[ArrayType].elementType, + valueType = right.dataType.asInstanceOf[ArrayType].elementType, + valueContainsNull = left.dataType.asInstanceOf[ArrayType].containsNull) + } + + override def nullable: Boolean = false + + override def nullSafeEval(keyArray: Any, valueArray: Any): Any = { +val keyArrayData = keyArray.asInstanceOf[ArrayData] --- End diff -- I would like to err on the safe side here. `CreateMap` should be fixed IMO. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1909/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22017 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94362/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21986 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21986 **[Test build #94367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94367/testReport)** for PR 21986 at commit [`af79644`](https://github.com/apache/spark/commit/af79644cb4687b6acb9a10548f05aef980f1882a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22017 **[Test build #94362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94362/testReport)** for PR 22017 at commit [`ec583eb`](https://github.com/apache/spark/commit/ec583eb29ba6fdb79d0b85cbecb3f709e6648b25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrayDataUnion(elementType: DataType) extends ((ArrayData, ArrayData) => ArrayData) ` * `case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22020: [SPARK-25041][build] upgrade genJavaDoc-plugin from 0.10...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22020 **[Test build #94356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94356/testReport)** for PR 22020 at commit [`1b41ce4`](https://github.com/apache/spark/commit/1b41ce44800310cd0ebd321f8436db5bed452935). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user invkrh commented on a diff in the pull request: https://github.com/apache/spark/pull/22006#discussion_r208215172 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -452,4 +452,31 @@ class DataTypeSuite extends SparkFunSuite { new StructType().add("f1", IntegerType).add("f", new StructType().add("f2", StringType, false)), new StructType().add("f2", IntegerType).add("g", new StructType().add("f1", StringType)), false) + + test("SPARK-25031: MapType should produce current formatted string for complex types") { + --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1913/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22024 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22024 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1894/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22022 **[Test build #94361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94361/testReport)** for PR 22022 at commit [`657d364`](https://github.com/apache/spark/commit/657d3643e63d79095c47b45ce14429e9fa08f25b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21986 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21845 **[Test build #94349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94349/testReport)** for PR 21845 at commit [`7afc5c5`](https://github.com/apache/spark/commit/7afc5c52fa31595b1eb458100d37fe92f62e31aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94351/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22021 **[Test build #94357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94357/testReport)** for PR 22021 at commit [`fb68910`](https://github.com/apache/spark/commit/fb68910f82b1a2729364573a1c926b5b7b5c7c12). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94357/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21241: [SPARK-24135][K8s] Resilience to init-container errors o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21241 **[Test build #94354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94354/testReport)** for PR 21241 at commit [`9df84e8`](https://github.com/apache/spark/commit/9df84e87a36bd6b43b0f74c26ad3bf70a67bb467). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21241: [SPARK-24135][K8s] Resilience to init-container errors o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21241 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21241: [SPARK-24135][K8s] Resilience to init-container errors o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21241 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94354/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22006 **[Test build #94368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94368/testReport)** for PR 22006 at commit [`06f656d`](https://github.com/apache/spark/commit/06f656d7fb37f8f41a213cfc861d3e2515d26d33). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r208210338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -365,3 +364,101 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. +""", + examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val MapType(keyType, leftValueType, _) = getMapType(left) + + @transient lazy val MapType(_, rightValueType, _) = getMapType(right) + + @transient lazy val arrayDataUnion = new ArrayDataUnion(keyType) + + @transient lazy val ordering = TypeUtils.getInterpretedOrdering(keyType) + + override def inputs: Seq[Expression] = left :: right :: Nil + + override def functions: Seq[Expression] = function :: Nil + + override def nullable: Boolean = left.nullable || right.nullable + + override def dataType: DataType = MapType(keyType, function.dataType, function.nullable) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (MapType(k1, _, _), MapType(k2, _, _)) if k1.sameType(k2) => +TypeUtils.checkForOrderingExpr(k1, s"function $prettyName") + case _ => TypeCheckResult.TypeCheckFailure(s"The input to function $prettyName should have " + +s"been two ${MapType.simpleString}s with the same key type, but it's " + +s"[${left.dataType.catalogString}, ${right.dataType.catalogString}].") +} + } + + private def getMapType(expr: Expression) = expr.dataType match { +case m: MapType => m +case _ => MapType.defaultConcreteType + } + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): MapZipWith = { +val arguments = Seq((keyType, false), (leftValueType, true), (rightValueType, true)) +copy(function = f(function, arguments)) + } + + override def eval(input: InternalRow): Any = { +val value1 = left.eval(input) +if (value1 == null) { + null +} else { + val value2 = right.eval(input) + if (value2 == null) { +null + } else { +nullSafeEval(input, value1, value2) + } +} + } + + @transient lazy val LambdaFunction(_, Seq( +keyVar: NamedLambdaVariable, +value1Var: NamedLambdaVariable, +value2Var: NamedLambdaVariable), +_) = function + + private def nullSafeEval(inputRow: InternalRow, value1: Any, value2: Any): Any = { +val mapData1 = value1.asInstanceOf[MapData] +val mapData2 = value2.asInstanceOf[MapData] +val keys = arrayDataUnion(mapData1.keyArray(), mapData2.keyArray()) +val values = new GenericArrayData(new Array[Any](keys.numElements())) +keys.foreach(keyType, (idx: Int, key: Any) => { + val v1 = GetMapValueUtil.getValueEval(mapData1, key, keyType, leftValueType, ordering) --- End diff -- I think there is no plan to have a different map implementation and anyway there is a lot of code which depends on having the array based version of MapData. Regarding the duplicated code, to be honest, I think that avoiding the refactoring introduced by that would also make this PR cleaner... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22017 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21986 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94367/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r208211250 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -365,3 +364,101 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. +""", + examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val MapType(keyType, leftValueType, _) = getMapType(left) + + @transient lazy val MapType(_, rightValueType, _) = getMapType(right) + + @transient lazy val arrayDataUnion = new ArrayDataUnion(keyType) + + @transient lazy val ordering = TypeUtils.getInterpretedOrdering(keyType) + + override def inputs: Seq[Expression] = left :: right :: Nil + + override def functions: Seq[Expression] = function :: Nil + + override def nullable: Boolean = left.nullable || right.nullable + + override def dataType: DataType = MapType(keyType, function.dataType, function.nullable) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (MapType(k1, _, _), MapType(k2, _, _)) if k1.sameType(k2) => +TypeUtils.checkForOrderingExpr(k1, s"function $prettyName") + case _ => TypeCheckResult.TypeCheckFailure(s"The input to function $prettyName should have " + +s"been two ${MapType.simpleString}s with the same key type, but it's " + +s"[${left.dataType.catalogString}, ${right.dataType.catalogString}].") +} + } + + private def getMapType(expr: Expression) = expr.dataType match { +case m: MapType => m +case _ => MapType.defaultConcreteType + } + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): MapZipWith = { +val arguments = Seq((keyType, false), (leftValueType, true), (rightValueType, true)) +copy(function = f(function, arguments)) + } + + override def eval(input: InternalRow): Any = { +val value1 = left.eval(input) +if (value1 == null) { + null +} else { + val value2 = right.eval(input) + if (value2 == null) { +null + } else { +nullSafeEval(input, value1, value2) + } +} + } + + @transient lazy val LambdaFunction(_, Seq( +keyVar: NamedLambdaVariable, +value1Var: NamedLambdaVariable, +value2Var: NamedLambdaVariable), +_) = function + + private def nullSafeEval(inputRow: InternalRow, value1: Any, value2: Any): Any = { +val mapData1 = value1.asInstanceOf[MapData] +val mapData2 = value2.asInstanceOf[MapData] +val keys = arrayDataUnion(mapData1.keyArray(), mapData2.keyArray()) +val values = new GenericArrayData(new Array[Any](keys.numElements())) +keys.foreach(keyType, (idx: Int, key: Any) => { + val v1 = GetMapValueUtil.getValueEval(mapData1, key, keyType, leftValueType, ordering) --- End diff -- Ok, I will change it. Thanks a lot! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21986 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22020: [SPARK-25041][build] upgrade genJavaDoc-plugin from 0.10...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94356/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #94364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94364/testReport)** for PR 21305 at commit [`e81790d`](https://github.com/apache/spark/commit/e81790d072ed66f1126d5918bd1a39222a9f5cfa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #94373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94373/testReport)** for PR 21305 at commit [`e81790d`](https://github.com/apache/spark/commit/e81790d072ed66f1126d5918bd1a39222a9f5cfa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21933 **[Test build #94358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94358/testReport)** for PR 21933 at commit [`e2961eb`](https://github.com/apache/spark/commit/e2961eb86f689de83770de5c3a73838512a62001). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17185 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1914/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94361/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21845 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94349/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21845 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17185 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17185 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94353/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22021 **[Test build #94370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94370/testReport)** for PR 22021 at commit [`fb68910`](https://github.com/apache/spark/commit/fb68910f82b1a2729364573a1c926b5b7b5c7c12). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1910/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22022 **[Test build #94369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94369/testReport)** for PR 22022 at commit [`657d364`](https://github.com/apache/spark/commit/657d3643e63d79095c47b45ce14429e9fa08f25b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21986 **[Test build #94371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94371/testReport)** for PR 21986 at commit [`af79644`](https://github.com/apache/spark/commit/af79644cb4687b6acb9a10548f05aef980f1882a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21986 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1911/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21986: [SPARK-23937][SQL] Add map_filter SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21986 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22020: [SPARK-25041][build] upgrade genJavaDoc-plugin from 0.10...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22020 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94364/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22024: [SPARK-25034][CORE] Remove allocations in onBlock...
GitHub user vincent-grosbois opened a pull request: https://github.com/apache/spark/pull/22024 [SPARK-25034][CORE] Remove allocations in onBlockFetchSuccess This method is only transferring a ManagedBuffer to the caller, so there is no reason why it should allocate 2 (!) intermediate data buffers in order to do so. In this commit I'm removing the conversion from any kind of managed buffer besides FileSegment to a NioManagedBuffer. However if you check the only calling method getRemoteBytes(), you will see that here we either: - do a memory-map if we have a FileSegmentManagedBuffer - try again to call the nioByteBuffer() method otherwise So in any case the conversion will occur later. ## What changes were proposed in this pull request? Remove needless temporary allocations ## How was this patch tested? Tested this change with a few jobs You can merge this pull request into a Git repository by running: $ git pull https://github.com/vincent-grosbois/spark no-alloc-onfetchsuccess Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22024.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22024 commit 2c182b3c93c7cc70f042d7dcd82520ac2adece1c Author: Vincent Grosbois Date: 2018-08-07T12:34:32Z [SPARK-25034][CORE] Remove allocations in onBlockFetchSuccess This method is only transferring a ManagedBuffer to the caller, so there is no reason why it should allocate 2 (!) intermediate data buffers in order to do so. In this commit I'm removing the conversion from any kind of managed buffer besides FileSegment to a NioManagedBuffer. However if you check the only calling method getRemoteBytes(), you will see that here we either: - do a memory-map if we have a FileSegmentManagedBuffer - try again to call the nioByteBuffer() method otherwise So in any case the conversion will occur later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1912/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94358/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21933 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94350/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r208204796 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -365,3 +364,101 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. +""", + examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val MapType(keyType, leftValueType, _) = getMapType(left) + + @transient lazy val MapType(_, rightValueType, _) = getMapType(right) + + @transient lazy val arrayDataUnion = new ArrayDataUnion(keyType) + + @transient lazy val ordering = TypeUtils.getInterpretedOrdering(keyType) + + override def inputs: Seq[Expression] = left :: right :: Nil + + override def functions: Seq[Expression] = function :: Nil + + override def nullable: Boolean = left.nullable || right.nullable + + override def dataType: DataType = MapType(keyType, function.dataType, function.nullable) + + override def checkInputDataTypes(): TypeCheckResult = { +(left.dataType, right.dataType) match { + case (MapType(k1, _, _), MapType(k2, _, _)) if k1.sameType(k2) => +TypeUtils.checkForOrderingExpr(k1, s"function $prettyName") + case _ => TypeCheckResult.TypeCheckFailure(s"The input to function $prettyName should have " + +s"been two ${MapType.simpleString}s with the same key type, but it's " + +s"[${left.dataType.catalogString}, ${right.dataType.catalogString}].") +} + } + + private def getMapType(expr: Expression) = expr.dataType match { +case m: MapType => m +case _ => MapType.defaultConcreteType + } + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): MapZipWith = { +val arguments = Seq((keyType, false), (leftValueType, true), (rightValueType, true)) +copy(function = f(function, arguments)) + } + + override def eval(input: InternalRow): Any = { +val value1 = left.eval(input) +if (value1 == null) { + null +} else { + val value2 = right.eval(input) + if (value2 == null) { +null + } else { +nullSafeEval(input, value1, value2) + } +} + } + + @transient lazy val LambdaFunction(_, Seq( +keyVar: NamedLambdaVariable, +value1Var: NamedLambdaVariable, +value2Var: NamedLambdaVariable), +_) = function + + private def nullSafeEval(inputRow: InternalRow, value1: Any, value2: Any): Any = { +val mapData1 = value1.asInstanceOf[MapData] +val mapData2 = value2.asInstanceOf[MapData] +val keys = arrayDataUnion(mapData1.keyArray(), mapData2.keyArray()) +val values = new GenericArrayData(new Array[Any](keys.numElements())) +keys.foreach(keyType, (idx: Int, key: Any) => { + val v1 = GetMapValueUtil.getValueEval(mapData1, key, keyType, leftValueType, ordering) --- End diff -- Thanks for mentioning this! I'm not happy with the current complexity either. I've assumed that the implementation of maps will change into something with O(1) element access in future. By then, the complexity would be O(N) for types supporting equals as well and we would safe a portion of duplicated code. If you think that maps will remain like this for a long time, really like your suggestion with indexes. @ueshin What's your view on that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17185 **[Test build #94353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94353/testReport)** for PR 17185 at commit [`5f7e5d7`](https://github.com/apache/spark/commit/5f7e5d7bddca593d72818b07d71f678bd0a1982d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22006 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22022 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS][BACKPORT-2.3] Delegate check access p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22006#discussion_r208209419 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -452,4 +452,31 @@ class DataTypeSuite extends SparkFunSuite { new StructType().add("f1", IntegerType).add("f", new StructType().add("f2", StringType, false)), new StructType().add("f2", IntegerType).add("g", new StructType().add("f1", StringType)), false) + + test("SPARK-25031: MapType should produce current formatted string for complex types") { + --- End diff -- nit: unneeded blank line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22006 **[Test build #94372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94372/testReport)** for PR 22006 at commit [`4328199`](https://github.com/apache/spark/commit/4328199fe3738ceec0a2e87b934a20f56e08dc28). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21305 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21989: [SPARK-25003][PYSPARK][BRANCH-2.3] Use SessionExt...
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/21989 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22024 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21988: [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExt...
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/21988 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS][BACKPORT-2.2] Delegate check access p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22022 **[Test build #94374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94374/testReport)** for PR 22022 at commit [`16233d1`](https://github.com/apache/spark/commit/16233d181b0a61d6cd45a7dc42d49a8905c964ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18323 **[Test build #94376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94376/testReport)** for PR 18323 at commit [`2e2b2ca`](https://github.com/apache/spark/commit/2e2b2ca39ffb595ec5c26bcec71afa9df8a612c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21978 **[Test build #94375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94375/testReport)** for PR 21978 at commit [`00295ee`](https://github.com/apache/spark/commit/00295ee6b3713995641c90a9b3b7cd4a6b79ded6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17185 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22006: [SPARK-25031][SQL] Fix MapType schema print
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22006 **[Test build #94355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94355/testReport)** for PR 22006 at commit [`9bc0541`](https://github.com/apache/spark/commit/9bc0541f2caf0dfde0173c1cc1f2dadbd2f94ce3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r208136330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -365,3 +365,69 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Transform Keys in a map using the transform_keys function. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +input: Expression, +function: Expression) + extends ArrayBasedHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = input.nullable + + override def dataType: DataType = { +val valueType = input.dataType.asInstanceOf[MapType].valueType +MapType(function.dataType, valueType, input.nullable) + } + + override def inputTypes: Seq[AbstractDataType] = Seq(MapType, expectingFunctionType) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): + TransformKeys = { +val (keyElementType, valueElementType, containsNull) = input.dataType match { + case MapType(keyType, valueType, containsNullValue) => +(keyType, valueType, containsNullValue) + case _ => +val MapType(keyType, valueType, containsNullValue) = MapType.defaultConcreteType +(keyType, valueType, containsNullValue) +} +copy(function = f(function, (keyElementType, false) :: (valueElementType, containsNull) :: Nil)) + } + + @transient lazy val (keyVar, valueVar) = { +val LambdaFunction( +_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function +(keyVar, valueVar) + } + + override def eval(input: InternalRow): Any = { +val arr = this.input.eval(input).asInstanceOf[MapData] +if (arr == null) { + null +} else { + val f = functionForEval + val resultKeys = new GenericArrayData(new Array[Any](arr.numElements)) + var i = 0 + while (i < arr.numElements) { +keyVar.value.set(arr.keyArray().get(i, keyVar.dataType)) +valueVar.value.set(arr.valueArray().get(i, valueVar.dataType)) +resultKeys.update(i, f.eval(input)) --- End diff -- This assumes that the transformation will return a unique key right? If it doesn't you'll break the map semantics. For example: `map_key(some_map, (k, v) -> 0)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20611 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1901/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1902/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22021: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22021 Please change the title to add branch 2.3 backport tag. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22022 **[Test build #94360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94360/testReport)** for PR 22022 at commit [`16b7b40`](https://github.com/apache/spark/commit/16b7b400b57f0ac1a783a68e6219b0e520d7802f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22022: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22022 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94360/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21984 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208165977 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetsOnlyScanConfigBuilder.scala --- @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import org.apache.spark.sql.sources.v2.reader.{ScanConfig, ScanConfigBuilder} + +/** + * A very simple [[ScanConfigBuilder]] and [[ScanConfig]] implementation that carries offsets for + * streaming data sources. + */ +case class OffsetsOnlyScanConfigBuilder(start: Offset, end: Option[Offset] = None) + extends ScanConfigBuilder with ScanConfig { --- End diff -- otherwise we need to create 2 very similar classes. I'm fine with both. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org