[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #93740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93740/testReport)** for PR 21102 at commit [`c398291`](https://github.com/apache/spark/commit/c3982911d6195fc1bc1c63d72d4b3273f958accd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1455/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93736/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21488 **[Test build #93736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93736/testReport)** for PR 21488 at commit [`aa69915`](https://github.com/apache/spark/commit/aa69915165d9aaca2bcb5d22fb2fc9467bf16826). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21103 **[Test build #93739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93739/testReport)** for PR 21103 at commit [`3639b5b`](https://github.com/apache/spark/commit/3639b5b1b460c19b7c3a787d118a0bcf35cb8e23). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1454/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21103 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205961043 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContextImpl.scala --- @@ -39,8 +44,51 @@ private[spark] class BarrierTaskContextImpl( taskMemoryManager, localProperties, metricsSystem, taskMetrics) with BarrierTaskContext { - // TODO SPARK-24817 implement global barrier. - override def barrier(): Unit = {} + private val barrierCoordinator: RpcEndpointRef = { +val env = SparkEnv.get +RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv) + } + + private val timer = new Timer("Barrier task timer for barrier() calls.") + + private var barrierEpoch = 0 + + private lazy val numTasks = localProperties.getProperty("numTasks", "0").toInt + + override def barrier(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") + +val startTime = System.currentTimeMillis() +val timerTask = new TimerTask { + override def run(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) waiting " + + s"under the global sync since $startTime, has been waiting for " + + s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch " + + s"is $barrierEpoch.") + } +} +// Log the update of global sync every 60 seconds. +timer.schedule(timerTask, 6, 6) + +try { + barrierCoordinator.askSync[Unit]( +message = RequestToSync(numTasks, stageId, stageAttemptNumber, taskAttemptId, barrierEpoch), +timeout = new RpcTimeout(31536000 /** = 3600 * 24 * 365 */ seconds, "barrierTimeout")) + barrierEpoch += 1 + logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) finished " + +"global sync successfully, waited for " + +s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch is " + +s"$barrierEpoch.") --- End diff -- Nice catch! just updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93738/testReport)** for PR 21898 at commit [`766381d`](https://github.com/apache/spark/commit/766381d18174135d922a19172328268640a09f4c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1453/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205960887 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContextImpl.scala --- @@ -39,8 +44,51 @@ private[spark] class BarrierTaskContextImpl( taskMemoryManager, localProperties, metricsSystem, taskMetrics) with BarrierTaskContext { - // TODO SPARK-24817 implement global barrier. - override def barrier(): Unit = {} + private val barrierCoordinator: RpcEndpointRef = { +val env = SparkEnv.get +RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv) + } + + private val timer = new Timer("Barrier task timer for barrier() calls.") + + private var barrierEpoch = 0 + + private lazy val numTasks = localProperties.getProperty("numTasks", "0").toInt + + override def barrier(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") + +val startTime = System.currentTimeMillis() +val timerTask = new TimerTask { + override def run(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) waiting " + + s"under the global sync since $startTime, has been waiting for " + + s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch " + + s"is $barrierEpoch.") + } +} +// Log the update of global sync every 60 seconds. +timer.schedule(timerTask, 6, 6) + +try { + barrierCoordinator.askSync[Unit]( +message = RequestToSync(numTasks, stageId, stageAttemptNumber, taskAttemptId, barrierEpoch), +timeout = new RpcTimeout(31536000 /** = 3600 * 24 * 365 */ seconds, "barrierTimeout")) --- End diff -- I set a fix timeout for RPC intentionally, so users shall get a SparkException thrown by BarrierCoordinator, instead of RPCTimeoutException from the RPC framework. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205960624 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContextImpl.scala --- @@ -39,8 +44,51 @@ private[spark] class BarrierTaskContextImpl( taskMemoryManager, localProperties, metricsSystem, taskMetrics) with BarrierTaskContext { - // TODO SPARK-24817 implement global barrier. - override def barrier(): Unit = {} + private val barrierCoordinator: RpcEndpointRef = { +val env = SparkEnv.get +RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv) + } + + private val timer = new Timer("Barrier task timer for barrier() calls.") + + private var barrierEpoch = 0 + + private lazy val numTasks = localProperties.getProperty("numTasks", "0").toInt + + override def barrier(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") + +val startTime = System.currentTimeMillis() +val timerTask = new TimerTask { + override def run(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) waiting " + + s"under the global sync since $startTime, has been waiting for " + + s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch " + + s"is $barrierEpoch.") + } +} +// Log the update of global sync every 60 seconds. +timer.schedule(timerTask, 6, 6) + +try { + barrierCoordinator.askSync[Unit]( +message = RequestToSync(numTasks, stageId, stageAttemptNumber, taskAttemptId, barrierEpoch), +timeout = new RpcTimeout(31536000 /** = 3600 * 24 * 365 */ seconds, "barrierTimeout")) --- End diff -- Use `BARRIER_SYNC_TIMEOUT` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205960662 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContextImpl.scala --- @@ -39,8 +44,51 @@ private[spark] class BarrierTaskContextImpl( taskMemoryManager, localProperties, metricsSystem, taskMetrics) with BarrierTaskContext { - // TODO SPARK-24817 implement global barrier. - override def barrier(): Unit = {} + private val barrierCoordinator: RpcEndpointRef = { +val env = SparkEnv.get +RpcUtils.makeDriverRef("barrierSync", env.conf, env.rpcEnv) + } + + private val timer = new Timer("Barrier task timer for barrier() calls.") + + private var barrierEpoch = 0 + + private lazy val numTasks = localProperties.getProperty("numTasks", "0").toInt + + override def barrier(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) has entered " + + s"the global sync, current barrier epoch is $barrierEpoch.") + +val startTime = System.currentTimeMillis() +val timerTask = new TimerTask { + override def run(): Unit = { +logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) waiting " + + s"under the global sync since $startTime, has been waiting for " + + s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch " + + s"is $barrierEpoch.") + } +} +// Log the update of global sync every 60 seconds. +timer.schedule(timerTask, 6, 6) + +try { + barrierCoordinator.askSync[Unit]( +message = RequestToSync(numTasks, stageId, stageAttemptNumber, taskAttemptId, barrierEpoch), +timeout = new RpcTimeout(31536000 /** = 3600 * 24 * 365 */ seconds, "barrierTimeout")) + barrierEpoch += 1 + logInfo(s"Task $taskAttemptId from Stage $stageId(Attempt $stageAttemptNumber) finished " + +"global sync successfully, waited for " + +s"${(System.currentTimeMillis() - startTime) / 1000} seconds, current barrier epoch is " + +s"$barrierEpoch.") --- End diff -- Shall we stop `timer` for this epoch here if the global sync finished successfully? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205960688 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -27,6 +27,33 @@ trait BarrierTaskContext extends TaskContext { * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same * stage have reached this routine. + * + * This function is expected to be called by EVERY tasks in the same barrier stage in the SAME + * pattern, otherwise you may get a SparkException. Some examples of misuses listed below: + * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it + * shall lead to time out of the function call. + * rdd.barrier().mapPartitions { (iter, context) => + * if (context.partitionId() == 0) { + * // Do nothing. + * } else { + * context.barrier() + * } + * iter + * } + * + * 2. Include barrier() function in a try-catch code block, this may lead to mismatched call of + * barrier function exception. + * rdd.barrier().mapPartitions { (iter, context) => + * try { + * // Do something that might throw an Exception. + * doSomething() + * context.barrier() + * } catch { + * case e: Exception => logWarning("...", e) + * } + * context.barrier() --- End diff -- This is to demonstrate that in one task there is only one call of `barrier()` while in others there may be two calls of `barrier()`. Please refer to `BarrierTaskContextSuite`.`"throw exception if barrier() call mismatched"`. However, I'm still considering what shall be the most proper behavior for this scenario. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21898#discussion_r205960497 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -27,6 +27,33 @@ trait BarrierTaskContext extends TaskContext { * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same * stage have reached this routine. + * + * This function is expected to be called by EVERY tasks in the same barrier stage in the SAME + * pattern, otherwise you may get a SparkException. Some examples of misuses listed below: + * 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it + * shall lead to time out of the function call. + * rdd.barrier().mapPartitions { (iter, context) => + * if (context.partitionId() == 0) { + * // Do nothing. + * } else { + * context.barrier() + * } + * iter + * } + * + * 2. Include barrier() function in a try-catch code block, this may lead to mismatched call of + * barrier function exception. + * rdd.barrier().mapPartitions { (iter, context) => + * try { + * // Do something that might throw an Exception. + * doSomething() + * context.barrier() + * } catch { + * case e: Exception => logWarning("...", e) + * } + * context.barrier() --- End diff -- I guess here should not be another `context.barrier()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1452/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93737/testReport)** for PR 21898 at commit [`01c274b`](https://github.com/apache/spark/commit/01c274b40dfdd601b75bb5c19261fbc732c7763b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93735/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21895 **[Test build #93735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93735/testReport)** for PR 21895 at commit [`2ad5285`](https://github.com/apache/spark/commit/2ad52858ba9d17e26e6d7ea11658515af7a02a03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r205959224 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3968,3 +3964,234 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in the intersection of array1 and +array2, without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 3) + """, + since = "2.4.0") +case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = ArrayType(elementType, +left.dataType.asInstanceOf[ArrayType].containsNull && + right.dataType.asInstanceOf[ArrayType].containsNull) + + @transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val hs = new OpenHashSet[Any] +val hsResult = new OpenHashSet[Any] +var foundNullElement = false +var i = 0 +while (i < array2.numElements()) { + if (array2.isNullAt(i)) { +foundNullElement = true + } else { +val elem = array2.get(i, elementType) +hs.add(elem) + } + i += 1 +} +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +i = 0 +while (i < array1.numElements()) { + if (array1.isNullAt(i)) { +if (foundNullElement) { + arrayBuffer += null + foundNullElement = false +} + } else { +val elem = array1.get(i, elementType) +if (hs.contains(elem) && !hsResult.contains(elem)) { + arrayBuffer += elem + hsResult.add(elem) +} + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadySeenNull = false +var i = 0 +while (i < array1.numElements()) { + var found = false + val elem1 = array1.get(i, elementType) + if (array1.isNullAt(i)) { +if (!alreadySeenNull) { + var j = 0 + while (!found && j < array2.numElements()) { +found = array2.isNullAt(j) +j += 1 + } + // array2 is scanned only once for null element + alreadySeenNull = true +} + } else { +var j = 0 +while (!found && j < array2.numElements()) { + if (!array2.isNullAt(j)) { +val elem2 = array2.get(j, elementType) +if (ordering.equiv(elem1, elem2)) { + // check whether elem1 is already stored in arrayBuffer + var foundArrayBuffer = false + var k = 0 + while (!foundArrayBuffer && k < arrayBuffer.size) { +val va = arrayBuffer(k) +foundArrayBuffer = (va != null) && ordering.equiv(va, elem1) +k += 1 + } + found = !foundArrayBuffer +} + } + j += 1 +} + } + if (found) { +arrayBuffer += elem1 + } + i += 1 +} +new GenericArrayData(arrayBuffer) +} + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +evalIntersect(array1, array2) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val arrayData = classOf[ArrayData].getName +val i = ctx.freshName("i") --- End diff -- It would be good to refactor as a method from L4077 to L4124 since this part can be used among `union`, `except`, and `intersect`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205958402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, _) => +val newBranches = branches.foldLeft(List[(Expression, Expression)]()) { + case (newBranches, branch) => +if (newBranches.exists(_._1.semanticEquals(branch._1))) { + // If a condition in a branch is previously seen, this branch can be pruned. + // TODO: In fact, if a condition is a sub-condition of the previous one, + // TODO: it can be pruned. This is less strict and can be implemented + // TODO: by decomposing seen conditions. + newBranches --- End diff -- This seems good as the branch is useless. Removing it should simplify code and query plan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205958393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, _) => +val newBranches = branches.foldLeft(List[(Expression, Expression)]()) { + case (newBranches, branch) => +if (newBranches.exists(_._1.semanticEquals(branch._1))) { + // If a condition in a branch is previously seen, this branch can be pruned. + // TODO: In fact, if a condition is a sub-condition of the previous one, + // TODO: it can be pruned. This is less strict and can be implemented + // TODO: by decomposing seen conditions. + newBranches +} else if (newBranches.nonEmpty && newBranches.last._2.semanticEquals(branch._2)) { + // If the outputs of two adjacent branches are the same, two branches can be combined. + newBranches.take(newBranches.length - 1) +.:+((Or(newBranches.last._1, branch._1), newBranches.last._2)) --- End diff -- Can this provide any benefit? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1451/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: [SPARK-18057][SS] Update Kafka client version from 0.10....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21488 **[Test build #93736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93736/testReport)** for PR 21488 at commit [`aa69915`](https://github.com/apache/spark/commit/aa69915165d9aaca2bcb5d22fb2fc9467bf16826). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93734/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21439 **[Test build #93734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93734/testReport)** for PR 21439 at commit [`021350b`](https://github.com/apache/spark/commit/021350b29c11c19c5a1b8343553c4a531abe39a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21772 LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93731/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #93731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93731/testReport)** for PR 21608 at commit [`e727b69`](https://github.com/apache/spark/commit/e727b693e2814ef7f8ce9622bb5bff9714cdad31). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93732/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21909 **[Test build #93732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93732/testReport)** for PR 21909 at commit [`359c4fc`](https://github.com/apache/spark/commit/359c4fcbfdb4f4e77faa3977f381dc8e819e46fa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21908 good catch, LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21699 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21699 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93733/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21699 **[Test build #93733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93733/testReport)** for PR 21699 at commit [`e76e7ad`](https://github.com/apache/spark/commit/e76e7adcca6787cb334b19f8db35f3a4ec61bafc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21895 **[Test build #93735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93735/testReport)** for PR 21895 at commit [`2ad5285`](https://github.com/apache/spark/commit/2ad52858ba9d17e26e6d7ea11658515af7a02a03). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1450/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21899#discussion_r205954659 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -118,12 +119,19 @@ case class BroadcastExchangeExec( // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult // will catch this exception and re-throw the wrapped fatal throwable. case oe: OutOfMemoryError => -throw new SparkFatalException( +val sizeMessage = if (dataSize != -1) { + s"; Size of table is $dataSize" --- End diff -- You are taking into account only size of the `relation`. Probably, result of `child.executeCollectIterator()` consumes some memory too, and an user will need more memory than `dataSize`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93730/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93730/testReport)** for PR 21898 at commit [`5c5db85`](https://github.com/apache/spark/commit/5c5db85723e10a1c507c83ec2e370996202a77fe). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93727/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21908 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21908 **[Test build #93727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93727/testReport)** for PR 21908 at commit [`0bb6a5b`](https://github.com/apache/spark/commit/0bb6a5b29beb987fdaac2d595cfd63fbd752af65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21439 **[Test build #93734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93734/testReport)** for PR 21439 at commit [`021350b`](https://github.com/apache/spark/commit/021350b29c11c19c5a1b8343553c4a531abe39a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r205953494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -101,6 +102,17 @@ class JacksonParser( } } + private def makeArrayRootConverter(at: ArrayType): JsonParser => Seq[InternalRow] = { +val elemConverter = makeConverter(at.elementType) +(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) { + case START_ARRAY => Seq(InternalRow(convertArray(parser, elemConverter))) + case START_OBJECT if at.elementType.isInstanceOf[StructType] => --- End diff -- I am adding a comment for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21895 **[Test build #93726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93726/testReport)** for PR 21895 at commit [`480e326`](https://github.com/apache/spark/commit/480e3268f51dbd830f5ee5013efd2315f3aff3a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21909 **[Test build #93732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93732/testReport)** for PR 21909 at commit [`359c4fc`](https://github.com/apache/spark/commit/359c4fcbfdb4f4e77faa3977f381dc8e819e46fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21699 **[Test build #93733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93733/testReport)** for PR 21699 at commit [`e76e7ad`](https://github.com/apache/spark/commit/e76e7adcca6787cb334b19f8db35f3a4ec61bafc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21884: [SPARK-24960][Kubernetes] explicitly expose ports on dri...
Github user adelbertc commented on the issue: https://github.com/apache/spark/pull/21884 @mccheah Done https://issues.apache.org/jira/browse/SPARK-24960 Re: Exposing ports, my understanding of Kubernetes' networking model is it is possible for a cluster to be setup such that Pod ports are closed, perhaps as a security measure. At least in our Kubernetes cluster (tested both on 1.6.x and 1.8.x) the SparkPi example failed with a connection error without this change, and succeeded after. (I've also added this information to the JIRA) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21909 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #93731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93731/testReport)** for PR 21608 at commit [`e727b69`](https://github.com/apache/spark/commit/e727b693e2814ef7f8ce9622bb5bff9714cdad31). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93729/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21909 **[Test build #93729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93729/testReport)** for PR 21909 at commit [`359c4fc`](https://github.com/apache/spark/commit/359c4fcbfdb4f4e77faa3977f381dc8e819e46fa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mstreuhofer commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r205951170 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.python + +import java.io.File +import java.util.{Map => JMap} +import java.util.Arrays +import java.util.concurrent.atomic.AtomicInteger + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging + + +class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean) + extends Logging { + + private val virtualEnvType = conf.get("spark.pyspark.virtualenv.type", "native") + private val virtualEnvBinPath = conf.get("spark.pyspark.virtualenv.bin.path", "") + private val initPythonPackages = conf.getOption("spark.pyspark.virtualenv.packages") + private var virtualEnvName: String = _ + private var virtualPythonExec: String = _ + private val VIRTUALENV_ID = new AtomicInteger() + private var isLauncher: Boolean = false + + // used by launcher when user want to use virtualenv in pyspark shell. Launcher need this class + // to create virtualenv for driver. + def this(pythonExec: String, properties: JMap[String, String], isDriver: java.lang.Boolean) { +this(pythonExec, new SparkConf().setAll(properties.asScala), isDriver) +this.isLauncher = true + } + + /* + * Create virtualenv using native virtualenv or conda + * + */ + def setupVirtualEnv(): String = { +/* + * + * Native Virtualenv: + * - Execute command: virtualenv -p --no-site-packages + * - Execute command: python -m pip --cache-dir install -r + * + * Conda + * - Execute command: conda create --prefix --file -y + * + */ +logInfo("Start to setup virtualenv...") +logDebug("user.dir=" + System.getProperty("user.dir")) +logDebug("user.home=" + System.getProperty("user.home")) + +require(virtualEnvType == "native" || virtualEnvType == "conda", + s"VirtualEnvType: $virtualEnvType is not supported." ) +require(new File(virtualEnvBinPath).exists(), + s"VirtualEnvBinPath: $virtualEnvBinPath is not defined or doesn't exist.") +// Two scenarios of creating virtualenv: +// 1. created in yarn container. Yarn will clean it up after container is exited +// 2. created outside yarn container. Spark need to create temp directory and clean it after app +//finish. +// - driver of PySpark shell +// - driver of yarn-client mode +if (isLauncher || + (isDriver && conf.get("spark.submit.deployMode") == "client")) { + val virtualenvBasedir = Files.createTempDir() + virtualenvBasedir.deleteOnExit() + virtualEnvName = virtualenvBasedir.getAbsolutePath +} else if (isDriver && conf.get("spark.submit.deployMode") == "cluster") { + virtualEnvName = "virtualenv_driver" +} else { + // use the working directory of Executor + virtualEnvName = "virtualenv_" + conf.getAppId + "_" + VIRTUALENV_ID.getAndIncrement() +} + +// Use the absolute path of requirement file in the following cases +// 1. driver of pyspark shell +// 2. driver of yarn-client mode +// otherwise just use filename as it would be downloaded to the working directory of Executor +val pysparkRequirements = + if (isLauncher || +(isDriver && conf.get("spark.submit.deployMode") == "client")) { +conf.getOption("spark.pyspark.virtualenv.requirements") + } else { + conf.getOption("spark.pyspark.virtualenv.requirements").map(_.split("/").last) + } + +val
[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...
Github user mstreuhofer commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r205950127 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.python + +import java.io.File +import java.util.{Map => JMap} +import java.util.Arrays +import java.util.concurrent.atomic.AtomicInteger + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging + + +class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean) + extends Logging { + + private val virtualEnvType = conf.get("spark.pyspark.virtualenv.type", "native") + private val virtualEnvBinPath = conf.get("spark.pyspark.virtualenv.bin.path", "") + private val initPythonPackages = conf.getOption("spark.pyspark.virtualenv.packages") + private var virtualEnvName: String = _ + private var virtualPythonExec: String = _ + private val VIRTUALENV_ID = new AtomicInteger() + private var isLauncher: Boolean = false + + // used by launcher when user want to use virtualenv in pyspark shell. Launcher need this class + // to create virtualenv for driver. + def this(pythonExec: String, properties: JMap[String, String], isDriver: java.lang.Boolean) { +this(pythonExec, new SparkConf().setAll(properties.asScala), isDriver) +this.isLauncher = true + } + + /* + * Create virtualenv using native virtualenv or conda + * + */ + def setupVirtualEnv(): String = { +/* + * + * Native Virtualenv: + * - Execute command: virtualenv -p --no-site-packages + * - Execute command: python -m pip --cache-dir install -r + * + * Conda + * - Execute command: conda create --prefix --file -y + * + */ +logInfo("Start to setup virtualenv...") +logDebug("user.dir=" + System.getProperty("user.dir")) +logDebug("user.home=" + System.getProperty("user.home")) + +require(virtualEnvType == "native" || virtualEnvType == "conda", + s"VirtualEnvType: $virtualEnvType is not supported." ) +require(new File(virtualEnvBinPath).exists(), + s"VirtualEnvBinPath: $virtualEnvBinPath is not defined or doesn't exist.") +// Two scenarios of creating virtualenv: +// 1. created in yarn container. Yarn will clean it up after container is exited +// 2. created outside yarn container. Spark need to create temp directory and clean it after app +//finish. +// - driver of PySpark shell +// - driver of yarn-client mode +if (isLauncher || + (isDriver && conf.get("spark.submit.deployMode") == "client")) { + val virtualenvBasedir = Files.createTempDir() + virtualenvBasedir.deleteOnExit() --- End diff -- the temporary directory is not being deleted on exit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21895 +CC @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21895: [SPARK-24948][SHS] Delegate check access permissi...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/21895#discussion_r205948923 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -973,6 +973,38 @@ private[history] object FsHistoryProvider { private[history] val CURRENT_LISTING_VERSION = 1L } +private[history] trait CachedFileSystemHelper extends Logging { + protected def fs: FileSystem + + /** + * Cache containing the result for the already checked files. + */ + // Visible for testing. + private[history] val cache = new mutable.HashMap[String, Boolean] --- End diff -- For long running history server in busy clusters (particularly where `spark.history.fs.cleaner.maxAge` is configured to be low), this Map will cause OOM. Either an LRU cache or a disk backed map with periodic cleanup (based on maxAge) might be better ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #93730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93730/testReport)** for PR 21898 at commit [`5c5db85`](https://github.com/apache/spark/commit/5c5db85723e10a1c507c83ec2e370996202a77fe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1449/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21898 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19659: [SPARK-19668][ML] Multiple NGram sizes
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19659 @holdenk I can take this up if this is needed. Let me know. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93728/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21847 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21847 **[Test build #93728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93728/testReport)** for PR 21847 at commit [`183b684`](https://github.com/apache/spark/commit/183b6841a8a121bb07d9fe553ca213eda78eb35a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205947535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, _) => +val newBranches = branches.foldLeft(List[(Expression, Expression)]()) { + case (newBranches, branch) => +if (newBranches.exists(_._1.semanticEquals(branch._1))) { + // If a condition in a branch is previously seen, this branch can be pruned. + // TODO: In fact, if a condition is a sub-condition of the previous one, + // TODO: it can be pruned. This is less strict and can be implemented + // TODO: by decomposing seen conditions. + newBranches +} else if (newBranches.nonEmpty && newBranches.last._2.semanticEquals(branch._2)) { + // If the outputs of two adjacent branches are the same, two branches can be combined. + newBranches.take(newBranches.length - 1) --- End diff -- nit: `newBranches.init` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205947481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, _) => +val newBranches = branches.foldLeft(List[(Expression, Expression)]()) { --- End diff -- what about using `ArrayBuffer`? So we don't have to recreate the List at every iteration... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21909 **[Test build #93729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93729/testReport)** for PR 21909 at commit [`359c4fc`](https://github.com/apache/spark/commit/359c4fcbfdb4f4e77faa3977f381dc8e819e46fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21909 [SPARK-24959][SQL] Speed up count() for JSON and CSV ## What changes were proposed in this pull request? In the PR, I propose to skip invoking of the CSV/JSON parser per each line in the case if the required schema is empty. Added benchmarks for `count()` shows performance improvement up to **3.5 times**. Before: ``` Count a dataset with 10 columns: Best/Avg Time(ms)Rate(M/s) Per Row(ns) -- JSON count() 7676 / 7715 1.3 767.6 CSV count()3309 / 3363 3.0 330.9 ``` After: ``` Count a dataset with 10 columns: Best/Avg Time(ms)Rate(M/s) Per Row(ns) -- JSON count() 2104 / 2156 4.8 210.4 CSV count()2332 / 2386 4.3 233.2 ``` ## How was this patch tested? It was tested by `CSVSuite` and `JSONSuite` as well as on added benchmarks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 empty-schema-optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21909.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21909 commit bc4ce261a2d13be0a31b18f006da79b55880d409 Author: Maxim Gekk Date: 2018-07-28T15:31:20Z Added a benchmark for count() commit 91250d21d4bb451062873c59df6fe3b4669bc5ff Author: Maxim Gekk Date: 2018-07-28T15:50:15Z Added a CSV benchmark for count() commit bdc5ea540b9eb62bb28606bdeb311ce5662e4bf7 Author: Maxim Gekk Date: 2018-07-28T15:59:44Z Speed up count() commit d40f9bb229ab8ea9e2d95499ae203f7c41098bcd Author: Maxim Gekk Date: 2018-07-28T16:00:17Z Updating CSV and JSON benchmarks for count() commit abd8572497ff742ef6ea942864195be75a40ca71 Author: Maxim Gekk Date: 2018-07-28T16:23:03Z Fix benchmark's output commit 359c4fcbfdb4f4e77faa3977f381dc8e819e46fa Author: Maxim Gekk Date: 2018-07-28T16:23:44Z Uncomment other benchmarks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21909 @MaxGekk, thanks! I am a bot who has found some folks who might be able to help with the review:@HyukjinKwon, @gatorsmile and @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21847 **[Test build #93728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93728/testReport)** for PR 21847 at commit [`183b684`](https://github.com/apache/spark/commit/183b6841a8a121bb07d9fe553ca213eda78eb35a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21908 **[Test build #93727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93727/testReport)** for PR 21908 at commit [`0bb6a5b`](https://github.com/apache/spark/commit/0bb6a5b29beb987fdaac2d595cfd63fbd752af65). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1448/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21908 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSui...
Github user holdensmagicalunicorn commented on the issue: https://github.com/apache/spark/pull/21908 @jiangxb1987, thanks! I am a bot who has found some folks who might be able to help with the review:@squito, @kayousterhout and @mateiz --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21908: [MINOR][CORE][TEST] Fix afterEach() in TastSetMan...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/21908 [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSuite and TaskSchedulerImplSuite ## What changes were proposed in this pull request? In the `afterEach()` method of both `TastSetManagerSuite` and `TaskSchedulerImplSuite`, `super.afterEach()` shall be called at the end, because it shall stop the SparkContext. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93706/testReport/org.apache.spark.scheduler/TaskSchedulerImplSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ The test failure is caused by the above reason, the newly added `barrierCoordinator` required `rpcEnv` which has been stopped before `TaskSchedulerImpl` doing cleanup. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark afterEach Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21908.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21908 commit 0bb6a5b29beb987fdaac2d595cfd63fbd752af65 Author: Xingbo Jiang Date: 2018-07-28T16:09:07Z fix afterEach --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205946975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { // these branches can be pruned away val (h, t) = branches.span(_._1 != TrueLiteral) CaseWhen( h :+ t.head, None) + + case e @ CaseWhen(branches, Some(elseValue)) if { --- End diff -- We can not. When no `elseValue`, all the conditions are required to evaluated before hitting the default `elseValue` which is `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMilli...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21901 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1447/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21901: [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and m...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21901 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21895 cc @mridulm too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21895 **[Test build #93726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93726/testReport)** for PR 21895 at commit [`480e326`](https://github.com/apache/spark/commit/480e3268f51dbd830f5ee5013efd2315f3aff3a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21886: [SPARK-21274][SQL] Implement INTERSECT ALL clause
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org