[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19884 @shaneknapp afaik Spark is supporting pyarrow with python 2.7 and we should be testing these also, but I'm not sure about pypy. Maybe @ueshin or @HyukjinKwon can confirm before we start upgrading? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19975: [SPARK-22781][SS] Support creating streaming data...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19975 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19975 Thanks! Merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19872 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19872 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85152/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19872 **[Test build #85152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85152/testReport)** for PR 19872 at commit [`ea5d6f3`](https://github.com/apache/spark/commit/ea5d6f319aa3b1bba20ad86a51e6efb65658e3d2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r157952725 --- Diff: python/pyspark/sql/types.py --- @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema): for field in arrow_schema]) +def _require_minimum_pyarrow_version(): --- End diff -- sounds good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20019 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85153/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19884#discussion_r157952778 --- Diff: python/pyspark/sql/types.py --- @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema): for field in arrow_schema]) +def _require_minimum_pyarrow_version(): +""" Raise ImportError if minimum version of pyarrow is not installed +""" +from distutils.version import LooseVersion +import pyarrow +if pyarrow.__version__ < LooseVersion('0.8.0'): --- End diff -- sure, no prob --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20019 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20019 **[Test build #85153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85153/testReport)** for PR 20019 at commit [`87e61ce`](https://github.com/apache/spark/commit/87e61cee4c2d8aab2a95fd9123b414736de5d3f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20021 Here is [another example](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L414). This is complicated. `result` is passed to `arguments` of `ctx.splitExpressions` in `genHashForStruct`. The `result` may get `ev.value` in a certain call path that is started from [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L277) As this example points out, while `arguments` in `ctx.splitExpressions` has local variable, a local variable may have `ev.value` in some calling context. It does not seem to be easy to ensure no global variable in `arguments`. Thus, IMHO, it would be good to conbine localization and adding assert. Then, we declare a contract `no global variable in arguments` in comment. WDYT? ``` // hash.scala, line 277 override def doGenCode(ctx, ev): ExprCode = { ... computeHash(childGen.value, child.dataType, ev.value, ctx) ... } // hash.scala, line 448 protected def computeHash(input, dataType, result, ctx): String = computeHashWithTailRec(input, dataType, result, ctx) // hash.scala, line 414 private def computeHashWithTailRec(input, dataType, result, ctx): String = dataType match { ... case StructType(fields) => genHashForStruct(ctx, input, result, fields) } //hash.scala, line 402 protected def genHashForStruct(ctx, input, result, fields) { ... ctx.splitExpressions( ... arguments = Seq("InternalRow" -> input, hashResultType -> result), ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20020 **[Test build #85159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85159/testReport)** for PR 20020 at commit [`e25a9eb`](https://github.com/apache/spark/commit/e25a9eb285d56a771a56b77534413be59b9f111b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85149/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19977 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85149/testReport)** for PR 19977 at commit [`cb2a75b`](https://github.com/apache/spark/commit/cb2a75b69c9f82465ba5dccca3e2a22526462b52). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r157948426 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { }.isDefined } + private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match { + case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) => true + case Alias(child, _) => isPandasGroupAggUdf(child) + case _ => false + } + + private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = { +val actualAggExpr = agg.aggregateExpressions.drop(agg.groupingExpressions.length) +actualAggExpr.exists(isPandasGroupAggUdf) + } + + --- End diff -- nit: remove an extra line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r157939292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { }.isDefined } + private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match { + case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) => true + case Alias(child, _) => isPandasGroupAggUdf(child) + case _ => false + } + + private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = { +val actualAggExpr = agg.aggregateExpressions.drop(agg.groupingExpressions.length) --- End diff -- Do we need to drop the grouping expressions? If we need, we can drop them only if `conf.dataFrameRetainGroupColumns == true`, otherwise `aggregateExpressions` doesn't contain `groupingExpressions`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r157944969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.python + +import java.io.File + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, TaskContext} +import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, AttributeSet, Expression, JoinedRow, NamedExpression, PythonUDF, SortOrder, UnsafeProjection, UnsafeRow} +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, ClusteredDistribution, Distribution, Partitioning} +import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, UnaryExecNode} +import org.apache.spark.sql.types.{DataType, StructField, StructType} +import org.apache.spark.util.Utils + +case class AggregateInPandasExec( +groupingExpressions: Seq[Expression], +udfExpressions: Seq[PythonUDF], +resultExpressions: Seq[NamedExpression], +child: SparkPlan) + extends UnaryExecNode { + + override def output: Seq[Attribute] = resultExpressions.map(_.toAttribute) + + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def producedAttributes: AttributeSet = AttributeSet(output) + + override def requiredChildDistribution: Seq[Distribution] = { +if (groupingExpressions.isEmpty) { + AllTuples :: Nil +} else { + ClusteredDistribution(groupingExpressions) :: Nil +} + } + + private def collectFunctions(udf: PythonUDF): (ChainedPythonFunctions, Seq[Expression]) = { +udf.children match { + case Seq(u: PythonUDF) => +val (chained, children) = collectFunctions(u) +(ChainedPythonFunctions(chained.funcs ++ Seq(udf.func)), children) + case children => +// There should not be any other UDFs, or the children can't be evaluated directly. +assert(children.forall(_.find(_.isInstanceOf[PythonUDF]).isEmpty)) +(ChainedPythonFunctions(Seq(udf.func)), udf.children) +} + } + + override def requiredChildOrdering: Seq[Seq[SortOrder]] = +Seq(groupingExpressions.map(SortOrder(_, Ascending))) + + override protected def doExecute(): RDD[InternalRow] = { +val inputRDD = child.execute() + +val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536) +val reuseWorker = inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true) +val sessionLocalTimeZone = conf.sessionLocalTimeZone +val pandasRespectSessionTimeZone = conf.pandasRespectSessionTimeZone + +val (pyFuncs, inputs) = udfExpressions.map(collectFunctions).unzip + +val allInputs = new ArrayBuffer[Expression] +val dataTypes = new ArrayBuffer[DataType] + +allInputs.appendAll(groupingExpressions) + +val argOffsets = inputs.map { input => + input.map { e => +allInputs += e +dataTypes += e.dataType +allInputs.length - 1 - groupingExpressions.length + }.toArray +}.toArray + +val schema = StructType(dataTypes.zipWithIndex.map { case (dt, i) => + StructField(s"_$i", dt) +}) + +inputRDD.mapPartitionsInternal { iter => + val grouped = if (groupingExpressions.isEmpty) { +Iterator((null, iter)) + } else { +val groupedIter = GroupedIterator(iter, groupingExpressions, child.output) + +val dropGrouping = + UnsafeProjection.create(allInputs.drop(groupingExpressions.length), child.output) + +groupedI
[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r157944622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.python + +import java.io.File + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.{SparkEnv, TaskContext} +import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, AttributeSet, Expression, JoinedRow, NamedExpression, PythonUDF, SortOrder, UnsafeProjection, UnsafeRow} +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, ClusteredDistribution, Distribution, Partitioning} +import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, UnaryExecNode} +import org.apache.spark.sql.types.{DataType, StructField, StructType} +import org.apache.spark.util.Utils + +case class AggregateInPandasExec( +groupingExpressions: Seq[Expression], +udfExpressions: Seq[PythonUDF], +resultExpressions: Seq[NamedExpression], +child: SparkPlan) + extends UnaryExecNode { + + override def output: Seq[Attribute] = resultExpressions.map(_.toAttribute) + + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def producedAttributes: AttributeSet = AttributeSet(output) + + override def requiredChildDistribution: Seq[Distribution] = { +if (groupingExpressions.isEmpty) { + AllTuples :: Nil +} else { + ClusteredDistribution(groupingExpressions) :: Nil +} + } + + private def collectFunctions(udf: PythonUDF): (ChainedPythonFunctions, Seq[Expression]) = { +udf.children match { + case Seq(u: PythonUDF) => +val (chained, children) = collectFunctions(u) +(ChainedPythonFunctions(chained.funcs ++ Seq(udf.func)), children) + case children => +// There should not be any other UDFs, or the children can't be evaluated directly. +assert(children.forall(_.find(_.isInstanceOf[PythonUDF]).isEmpty)) +(ChainedPythonFunctions(Seq(udf.func)), udf.children) +} + } + + override def requiredChildOrdering: Seq[Seq[SortOrder]] = +Seq(groupingExpressions.map(SortOrder(_, Ascending))) + + override protected def doExecute(): RDD[InternalRow] = { +val inputRDD = child.execute() + +val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536) +val reuseWorker = inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true) +val sessionLocalTimeZone = conf.sessionLocalTimeZone +val pandasRespectSessionTimeZone = conf.pandasRespectSessionTimeZone + +val (pyFuncs, inputs) = udfExpressions.map(collectFunctions).unzip + +val allInputs = new ArrayBuffer[Expression] +val dataTypes = new ArrayBuffer[DataType] + +allInputs.appendAll(groupingExpressions) --- End diff -- I guess we don't need to append `groupingExpressions`. Seems like they are dropped later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r157938453 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -48,29 +48,46 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { }.isDefined } + private def isPandasGroupAggUdf(expr: Expression): Boolean = expr match { + case PythonUDF(_, _, _, _, PythonEvalType.SQL_PANDAS_GROUP_AGG_UDF) => true + case Alias(child, _) => isPandasGroupAggUdf(child) + case _ => false + } + + private def hasPandasGroupAggUdf(agg: Aggregate): Boolean = { +val actualAggExpr = agg.aggregateExpressions.drop(agg.groupingExpressions.length) +actualAggExpr.exists(isPandasGroupAggUdf) + } + + private def extract(agg: Aggregate): LogicalPlan = { val projList = new ArrayBuffer[NamedExpression]() val aggExpr = new ArrayBuffer[NamedExpression]() -agg.aggregateExpressions.foreach { expr => - if (hasPythonUdfOverAggregate(expr, agg)) { -// Python UDF can only be evaluated after aggregate -val newE = expr transformDown { - case e: Expression if belongAggregate(e, agg) => -val alias = e match { - case a: NamedExpression => a - case o => Alias(e, "agg")() -} -aggExpr += alias -alias.toAttribute + +if (hasPandasGroupAggUdf(agg)) { + Aggregate(agg.groupingExpressions, agg.aggregateExpressions, agg.child) --- End diff -- Do we need to copy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20021 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20021 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85150/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20021 **[Test build #85150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85150/testReport)** for PR 20021 at commit [`3d44195`](https://github.com/apache/spark/commit/3d44195f48c1688d7dc5b87fd0c9f07c1535000b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85148/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19950 **[Test build #85148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85148/testReport)** for PR 19950 at commit [`604cb7d`](https://github.com/apache/spark/commit/604cb7df1d892010316c964250977c8692799666). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85158/testReport)** for PR 19977 at commit [`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/5604#discussion_r157945816 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.UnresolvedException +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.{NumericType, DataType} + +/** + * The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for + * Window Functions. + */ +sealed trait WindowSpec + +/** + * The specification for a window function. + * @param partitionSpec It defines the way that input rows are partitioned. + * @param orderSpec It defines the ordering of rows in a partition. + * @param frameSpecification It defines the window frame in a partition. + */ +case class WindowSpecDefinition( +partitionSpec: Seq[Expression], +orderSpec: Seq[SortOrder], +frameSpecification: WindowFrame) extends Expression with WindowSpec { + + def validate: Option[String] = frameSpecification match { +case UnspecifiedFrame => + Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + +"during analysis. Please file a bug report.") +case frame: SpecifiedWindowFrame => frame.validate.orElse { + def checkValueBasedBoundaryForRangeFrame(): Option[String] = { +if (orderSpec.length > 1) { + // It is not allowed to have a value-based PRECEDING and FOLLOWING + // as the boundary of a Range Window Frame. + Some("This Range Window Frame only accepts at most one ORDER BY expression.") +} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { + Some("The data type of the expression in the ORDER BY clause should be a numeric type.") +} else { + None +} + } + + (frame.frameType, frame.frameStart, frame.frameEnd) match { +case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() +case (_, _, _) => None + } +} + } + + type EvaluatedType = Any + + override def children: Seq[Expression] = partitionSpec ++ orderSpec + + override lazy val resolved: Boolean = +childrenResolved && frameSpecification.isInstanceOf[SpecifiedWindowFrame] + + + override def toString: String = simpleString + + override def eval(input: Row): EvaluatedType = throw new UnsupportedOperationException + override def nullable: Boolean = true + override def foldable: Boolean = false + override def dataType: DataType = throw new UnsupportedOperationException +} + +/** + * A Window specification reference that refers to the [[WindowSpecDefinition]] defined + * under the name `name`. + */ +case class WindowSpecReference(name: String) extends WindowSpec + +/** + * The trait used to represent the type of a Window Frame. + */ +sealed trait FrameType + +/** + * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] + * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered + * as a physical offset. + * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, + * from the row precedes the current row to the row follows the current row. + */ +case object
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85157/testReport)** for PR 19977 at commit [`9887478`](https://github.com/apache/spark/commit/9887478064e81d0efa7d3b2d9d8abdffde9364a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85147/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19950 **[Test build #85147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85147/testReport)** for PR 19950 at commit [`3d1d3db`](https://github.com/apache/spark/commit/3d1d3dba1e78eb9a082fa1f6a2aa2437a9ed99a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85156/testReport)** for PR 20008 at commit [`b4c5339`](https://github.com/apache/spark/commit/b4c5339cc44eeed7e75aac30ba6b6aaf06316305). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157943490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { --- End diff -- Aha, sounds reasonable. I feel we should fix that in an separate pr first, thought? Or, this pr should include the fix, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20018: SPARK-22833 [Improvement] in SparkHive Scala Exam...
Github user chetkhatri commented on a diff in the pull request: https://github.com/apache/spark/pull/20018#discussion_r157942866 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/SparkHiveExample.scala --- @@ -104,6 +103,60 @@ object SparkHiveExample { // ... // $example off:spark_hive$ --- End diff -- @srowen Can you please review this cc\ @holdenk @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85155/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20028 **[Test build #85155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85155/testReport)** for PR 20028 at commit [`5d5978e`](https://github.com/apache/spark/commit/5d5978e5c83bc2037646f40d6db23059af530a15). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20028 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157942499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { +override protected def coerceTypes( +plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp { --- End diff -- From other viewpoints, is that a bad idea to drop this rule from `typeCoercionRules` if `spark.sql.function.concatBinaryAsString` is true? This is like; ``` def typeCoercionRules(conf: SQLConf): List[Rule[LogicalPlan]] = if (check `spark.sql.function.concatBinaryAsString` ) { // rules with ConcatCoercion } else { // rules without ConcatCoercion } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157942449 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala --- @@ -209,9 +213,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) .build() }.getOrElse(executorContainer) -new PodBuilder(executorPod) +val (maybeSecretsMountedPod, maybeSecretsMountedContainer) = + mountSecretsBootstrap.map { bootstrap => +bootstrap.mountSecrets(executorPod, containerWithLimitCores) --- End diff -- I created https://issues.apache.org/jira/browse/SPARK-22839. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19954: [SPARK-22757][Kubernetes] Enable use of remote de...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/19954#discussion_r157942418 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s + +import io.fabric8.kubernetes.api.model.{Container, ContainerBuilder, Pod, PodBuilder} + +/** + * Bootstraps a driver or executor container or an init-container with needed secrets mounted. + */ +private[spark] class MountSecretsBootstrap(secretNamesToMountPaths: Map[String, String]) { --- End diff -- I created https://issues.apache.org/jira/browse/SPARK-22839 to keep track of the refactoring work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19975 Gentle ping! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85146/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20015 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20015 **[Test build #85146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85146/testReport)** for PR 20015 at commit [`238d7d4`](https://github.com/apache/spark/commit/238d7d470c583c910bccbca8bbcaa681b67d6025). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait TruncInstant extends BinaryExpression with ImplicitCastInputTypes ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20021 I checked some call site. Here is [one example](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L285) that `arguments` has `ev.value` instead of local variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157941723 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java --- @@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) { } return Arrays.copyOfRange(bytes, start, end); } + + public static byte[] concat(byte[]... inputs) { +// Compute the total length of the result +int totalLength = 0; +for (int i = 0; i < inputs.length; i++) { + if (inputs[i] != null) { +totalLength += inputs[i].length; + } else { +return null; + } +} + +// Allocate a new byte array, and copy the inputs one by one into it +final byte[] result = new byte[totalLength]; +int offset = 0; +for (int i = 0; i < inputs.length; i++) { + int len = inputs[i].length; --- End diff -- nvm. we already have checked here https://github.com/apache/spark/pull/19977/files#diff-6df3223f9826f2b5d1d0c8e29a240ae3R82 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157941423 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java --- @@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) { } return Arrays.copyOfRange(bytes, start, end); } + + public static byte[] concat(byte[]... inputs) { +// Compute the total length of the result +int totalLength = 0; +for (int i = 0; i < inputs.length; i++) { + if (inputs[i] != null) { +totalLength += inputs[i].length; + } else { +return null; + } +} + +// Allocate a new byte array, and copy the inputs one by one into it +final byte[] result = new byte[totalLength]; +int offset = 0; +for (int i = 0; i < inputs.length; i++) { + int len = inputs[i].length; --- End diff -- aha, I see. `UTF8String` seems to need the same null check? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157940901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { --- End diff -- I feel this rule is a little hacky as it must be run before `ImplicitTypeCasts`. I think we should not make `Concat` extends `ImplicitCastInputTypes`, and do type coercion manually for it with a customer rule. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157940623 --- Diff: docs/sql-programming-guide.md --- @@ -1780,6 +1780,8 @@ options. - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489). + - Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157940634 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { +override protected def coerceTypes( +plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp { --- End diff -- we can add a stop condition `if isBinaryMode = false` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157940412 --- Diff: docs/sql-programming-guide.md --- @@ -1780,6 +1780,8 @@ options. - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489). + - Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. --- End diff -- also mention the config --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157940334 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java --- @@ -74,4 +74,29 @@ public static long getPrefix(byte[] bytes) { } return Arrays.copyOfRange(bytes, start, end); } + + public static byte[] concat(byte[]... inputs) { +// Compute the total length of the result +int totalLength = 0; +for (int i = 0; i < inputs.length; i++) { + if (inputs[i] != null) { +totalLength += inputs[i].length; + } else { +return null; + } +} + +// Allocate a new byte array, and copy the inputs one by one into it +final byte[] result = new byte[totalLength]; +int offset = 0; +for (int i = 0; i < inputs.length; i++) { + int len = inputs[i].length; --- End diff -- null check here too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157939820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -48,17 +48,26 @@ import org.apache.spark.unsafe.types.{ByteArray, UTF8String} > SELECT _FUNC_('Spark', 'SQL'); SparkSQL """) -case class Concat(children: Seq[Expression]) extends Expression with ImplicitCastInputTypes { +case class Concat(children: Seq[Expression], isBinaryMode: Boolean = false) + extends Expression with ImplicitCastInputTypes { - override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) - override def dataType: DataType = StringType + def this(children: Seq[Expression]) = this(children, false) --- End diff -- Sorry I didn't get it. We have the default parameter of `isBinaryMode = true` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user wesm commented on the issue: https://github.com/apache/spark/pull/19884 We need to upgrade to 0.8.0 now because we changed the binary format --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user wesm commented on the issue: https://github.com/apache/spark/pull/19884 We are supporting pyarrow on Python 2.7 except for Windows --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...
Github user ConeyLiu commented on a diff in the pull request: https://github.com/apache/spark/pull/20026#discussion_r157938250 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -208,7 +209,7 @@ private class EncryptedBlockData( conf: SparkConf, key: Array[Byte]) extends BlockData { - override def toInputStream(): InputStream = Channels.newInputStream(open()) + override def toInputStream(): InputStream = new NioBufferedFileInputStream(file) --- End diff -- You meaning the memory buffer? The `NioBufferedFileInputStream` has `close` method, you can see follow: org.apache.spark.io.NioBufferedFileInputStream.java ```java @Override public synchronized void close() throws IOException { fileChannel.close(); StorageUtils.dispose(byteBuffer); } @Override protected void finalize() throws IOException { close(); } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20028: [SPARK-19053][ML]Supporting multiple evaluation metrics ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20028 **[Test build #85155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85155/testReport)** for PR 20028 at commit [`5d5978e`](https://github.com/apache/spark/commit/5d5978e5c83bc2037646f40d6db23059af530a15). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20027: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20027 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20028: [SPARK-19053][ML]Supporting multiple evaluation m...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/20028 [SPARK-19053][ML]Supporting multiple evaluation metrics in DataFrame-based API ## What changes were proposed in this pull request? As an initial step, the PR creates BinaryClassificationMetrics, MultiClassClassificationMetrics, RegressionMetrics in ML. The long term target is to reach function parity with MLlib Metrics and be able to provide more enhancements for DataFrame-based API. This PR allows the Binary/MultilclassClassification/Regression Evaluator return a corresponding metrics instance, and users can use the metrics instance to access multiple metrics after a single pass, while originally Evaluator only allows access for single metric. ``` val evaluator = new MulticlassClassificationEvaluator().setMetricName("accuracy") val metrics = evaluator.getMetrics(df) metrics.accuracy metrics.weightedFMeasure metrics.weightedPrecision metrics.weightedRecall ``` To make the review easier, the current PR only includes metrics in the Evaluator to reach function parity. Plan for further development: 1. Initial API and function parity with ML Evaluators. (This PR) 2. Python API. 2. Function parity with MLlib Metrics. 3. Add requested enhancements like including weight, add per-row metrics, add ranking metrics. 4. Reorganize classification Metrics hierarchy, so Binary Classification Metrics can support metrics in MultiClassMetrics (accuracy, recall etc.). 5. Possibly to be used in training summary. The current implementation is still based on MLlib Metrics, which is kept completely internal and can be changed to DataFrame-based calculation when necessary. ## How was this patch tested? new unit tests added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark mlMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20028.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20028 commit 5d5978e5c83bc2037646f40d6db23059af530a15 Author: Yuhao Yang Date: 2017-12-20T02:51:19Z add ml metrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20027: Branch 2.2
Github user Maple-Wang commented on the issue: https://github.com/apache/spark/pull/20027 use SparkR in the R shell, the master parameter too old,connot run "spark-submit --master yarn --deploy-mode client" . I install R on all node. when use in this way: if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/usr/hdp/2.6.1.0-129/spark2") } library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sparkR.session(master = "yarn", sparkConfig = list(spark.driver.memory = "10g")) it comes out: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, node23.nuctech.com, executor 1): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:372) at org.apache.spark.api.r.RRunner.compute(RRunner.scala:69) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85154/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #85154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85154/testReport)** for PR 19218 at commit [`d779ee6`](https://github.com/apache/spark/commit/d779ee6524b199b695c938bbb0c3436c73c64c87). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ParquetOptions(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20027: Branch 2.2
GitHub user Maple-Wang opened a pull request: https://github.com/apache/spark/pull/20027 Branch 2.2 use SparkR in the R shell, the master parameter too old,connot run "spark-submit --master yarn --deploy-mode client" . I install R on all node. when use in this way: if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/usr/hdp/2.6.1.0-129/spark2") } library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sparkR.session(master = "yarn", sparkConfig = list(spark.driver.memory = "10g")) it comes out: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, node23.nuctech.com, executor 1): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:372) at org.apache.spark.api.r.RRunner.compute(RRunner.scala:69) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20027.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20027 commit 96c04f1edcd53798d9db5a356482248868a0a905 Author: Marcelo Vanzin Date: 2017-06-24T05:23:43Z [SPARK-21159][CORE] Don't try to connect to launcher in standalone cluster mode. Monitoring for standalone cluster mode is not implemented (see SPARK-11033), but the same scheduler implementation is used, and if it tries to connect to the launcher it will fail. So fix the scheduler so it only tries that in client mode; cluster mode applications will be correctly launched and will work, but monitoring through the launcher handle will not be available. Tested by running a cluster mode app with "SparkLauncher.startApplication". Author: Marcelo Vanzin Closes #18397 from vanzin/SPARK-21159. (cherry picked from commit bfd73a7c48b87456d1b84d826e04eca938a1be64) Signed-off-by: Wenchen Fan commit ad44ab5cb9cdaff836c7469d10b00a86a3e46adf Author: gatorsmile Date: 2017-06-24T14:35:59Z [SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct ### What changes were proposed in this pull request? ```SQL CREATE TABLE `tab1` (`custom_fields` ARRAY>) USING parquet INSERT INTO `tab1` SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b')) SELECT custom_fields.id, custom_fields.value FROM tab1 ``` The above query always return the last struct of the array, because the rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we always use the same `GenericInternalRow` object when doing the cast. ### How was this patch tested? Author: gatorsmile Closes #18412 from gatorsmile/castStruct. (cherry picked from commit 2e1586f60a77ea0adb6f3f68ba74323f0c242199) Signed-off-by: Wenchen Fan commit d8e3a4af36f85455548e82ae4acd525f5e52f322 Author: Masha Basmanova Date: 2017-06-25T05:49:35Z [SPARK-21079][SQL] Calculate total size of a partition table as a sum of individual partitions ## What changes were proposed in this pull request? Storage URI of a partitioned table may or may not point to a directory under which individual partitions are stored. In fact, individual partitions may be located in totally unrelated directories. Before this change, ANALYZE TABLE table COMPUTE STATISTICS command calculated total size of a table by adding up sizes of files found under table's storage URI. This calculation could produce 0 if partitions are stored elsewhere. This change uses storage URIs of individual partitions to calculate the sizes of all partitions of a table and adds these up to produce the total size of a table. CC: wzhfy ## How was this patch tested? Added unit test. Ran ANALYZE TABLE xxx COMPUTE STATISTICS on a partitioned Hive table and verified t
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #85154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85154/testReport)** for PR 19218 at commit [`d779ee6`](https://github.com/apache/spark/commit/d779ee6524b199b695c938bbb0c3436c73c64c87). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157937149 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql --- @@ -0,0 +1,21 @@ +-- Concatenate binary inputs +SELECT (col1 || col2 || col3 || col4) col +FROM ( + SELECT +encode(string(id), 'utf-8') col1, +encode(string(id + 1), 'utf-8') col2, +encode(string(id + 2), 'utf-8') col3, +encode(string(id + 3), 'utf-8') col4 + FROM range(10) +); + +-- Concatenate mixed inputs between strings and binary +SELECT (col1 || col2 || col3 || col4) col +FROM ( + SELECT +string(id) col1, +string(id + 1) col2, +encode(string(id + 2), 'utf-8') col3, +encode(string(id + 3), 'utf-8') col4 + FROM range(10) +); --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20019 **[Test build #85153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85153/testReport)** for PR 20019 at commit [`87e61ce`](https://github.com/apache/spark/commit/87e61cee4c2d8aab2a95fd9123b414736de5d3f1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19218 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20019: [SPARK-22361][SQL][TEST] Add unit test for Window Frames
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20019 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19805 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157936581 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { +override protected def coerceTypes( +plan: LogicalPlan): LogicalPlan = plan transformExpressionsUp { --- End diff -- This is in the big batch of analyzer rules: `Batch("Resolution")`. This rule will be executed without the stop condition. cc @cloud-fan Should we do it in `Batch("Finish Analysis")`? Any better idea? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19805 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85151/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20026 **[Test build #85151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85151/testReport)** for PR 20026 at commit [`51c32c8`](https://github.com/apache/spark/commit/51c32c8c6cc89a3249b3ef856c41f3c238b59f4a). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20026 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20026: [SPARK-22838][Core] Avoid unnecessary copying of ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20026#discussion_r157935911 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -208,7 +209,7 @@ private class EncryptedBlockData( conf: SparkConf, key: Array[Byte]) extends BlockData { - override def toInputStream(): InputStream = Channels.newInputStream(open()) + override def toInputStream(): InputStream = new NioBufferedFileInputStream(file) --- End diff -- will `NioBufferedFileInputStream` release its memory automatically? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19872 **[Test build #85152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85152/testReport)** for PR 19872 at commit [`ea5d6f3`](https://github.com/apache/spark/commit/ea5d6f319aa3b1bba20ad86a51e6efb65658e3d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157935878 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql --- @@ -0,0 +1,21 @@ +-- Concatenate binary inputs --- End diff -- Add more cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20026 **[Test build #85151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85151/testReport)** for PR 20026 at commit [`51c32c8`](https://github.com/apache/spark/commit/51c32c8c6cc89a3249b3ef856c41f3c238b59f4a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157935899 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/concat.sql --- @@ -0,0 +1,21 @@ +-- Concatenate binary inputs +SELECT (col1 || col2 || col3 || col4) col +FROM ( + SELECT +encode(string(id), 'utf-8') col1, +encode(string(id + 1), 'utf-8') col2, +encode(string(id + 2), 'utf-8') col3, +encode(string(id + 3), 'utf-8') col4 + FROM range(10) +); + +-- Concatenate mixed inputs between strings and binary +SELECT (col1 || col2 || col3 || col4) col +FROM ( + SELECT +string(id) col1, +string(id + 1) col2, +encode(string(id + 2), 'utf-8') col3, +encode(string(id + 3), 'utf-8') col4 + FROM range(10) +); --- End diff -- Add more cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20026 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157935707 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -635,6 +635,11 @@ object SimplifyCaseConversionExpressions extends Rule[LogicalPlan] { /** * Combine nested [[Concat]] expressions. + * + * If `spark.sql.expression.concat.binaryMode.enabled` is true and all inputs are binary, --- End diff -- also update this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20026: [SPARK-22838][Core] Avoid unnecessary copying of data
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20026 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20021 **[Test build #85150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85150/testReport)** for PR 20021 at commit [`3d44195`](https://github.com/apache/spark/commit/3d44195f48c1688d7dc5b87fd0c9f07c1535000b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r157935282 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1044,6 +1044,12 @@ object SQLConf { "When this conf is not set, the value from `spark.redaction.string.regex` is used.") .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN) + val CONCAT_BINARY_MODE_ENABLED = buildConf("spark.sql.expression.concat.binaryMode.enabled") --- End diff -- > spark.sql.function.concatBinaryAsString --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20008 functionArgumentConversion.sql has too many combinations. Let us get rid of this one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20015 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryErr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20014 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20014 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20015 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20015 Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85143/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for FunctionArgumentConv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85143/testReport)** for PR 20008 at commit [`16f0a99`](https://github.com/apache/spark/commit/16f0a996c694151bbc743bb49b8f1e2439c69a57). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19954 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85142/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19954 **[Test build #85142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85142/testReport)** for PR 19954 at commit [`3407d7a`](https://github.com/apache/spark/commit/3407d7af68c44b100558e49e4012a27e41b29dda). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19872: WIP: [SPARK-22274][PySpark] User-defined aggregation fun...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19872 @ramacode2014 Hi, I'm not sure why you received notifications from this PR, but I guess you can unsubscribe by the "Unsubscribe" button in the right column of this page. Sorry for the inconvenience. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org