[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #91831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91831/testReport)** for PR 21537 at commit [`b592e66`](https://github.com/apache/spark/commit/b592e66c030ba7c2d260c3be48c3b15139f40e5b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/133/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4022/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91822/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #91822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91822/testReport)** for PR 21537 at commit [`b592e66`](https://github.com/apache/spark/commit/b592e66c030ba7c2d260c3be48c3b15139f40e5b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195356119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- It is fine with me to address this in another PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195356161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- I think `PartitioningCollection` is for an operator that has multiple children. `BroadcastPartitioning` is not `Expression`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195355721 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- yes, you're right @viirya , thanks. Then, I'd propose something like: ``` relation.cachedPlan.outputPartitioning match { case e: Expression => updateAttribute(e) case other => other } ``` what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195354829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- Good suggestion, thanks @mgaido91. @viirya Do we need consider below: `PartitioningCollection` in `InMemoryTableScanExec.outputPartitioning`, which is also `Expression`? `PartitioningCollection` and `BroadcastPartitioning` in `ReusedExchangeExec.outputPartitioning`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195352300 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- Not all `Partitioning` are `Expression`. Only `HashPartitioning` and `RangePartitioning` are. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16763: [SPARK-19422][ML][WIP] Cache input data in algori...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16763 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16763: [SPARK-19422][ML][WIP] Cache input data in algorithms
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16763 This pr is out of date. I will close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195351851 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent._ + +import io.fabric8.kubernetes.api.model.Pod +import javax.annotation.concurrent.GuardedBy +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.util.{ThreadUtils, Utils} + +private[spark] class ExecutorPodsSnapshotsStoreImpl(subscribersExecutor: ScheduledExecutorService) --- End diff -- Could you add a description of the class here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19084: [SPARK-20711][ML]MultivariateOnlineSummarizer/Summarizer...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19084 @srowen Could you please give a final review? Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195350385 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong} + +import io.fabric8.kubernetes.api.model.PodBuilder +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesConf +import org.apache.spark.internal.Logging +import org.apache.spark.util.{Clock, Utils} + +private[spark] class ExecutorPodsAllocator( +conf: SparkConf, +executorBuilder: KubernetesExecutorBuilder, +kubernetesClient: KubernetesClient, +snapshotsStore: ExecutorPodsSnapshotsStore, +clock: Clock) extends Logging { + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + + private val totalExpectedExecutors = new AtomicInteger(0) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val podAllocationDelay = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podCreationTimeout = math.max(podAllocationDelay * 5, 6) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + + private val driverPod = kubernetesClient.pods() +.withName(kubernetesDriverPodName) +.get() + + // Executor IDs that have been requested from Kubernetes but have not been detected in any + // snapshot yet. Mapped to the timestamp when they were created. + private val newlyCreatedExecutors = mutable.Map.empty[Long, Long] + + def start(applicationId: String): Unit = { +snapshotsStore.addSubscriber(podAllocationDelay) { + onNewSnapshots(applicationId, _) +} + } + + def setTotalExpectedExecutors(total: Int): Unit = totalExpectedExecutors.set(total) + + private def onNewSnapshots(applicationId: String, snapshots: Seq[ExecutorPodsSnapshot]): Unit = { +newlyCreatedExecutors --= snapshots.flatMap(_.executorPods.keys) +// For all executors we've created against the API but have not seen in a snapshot +// yet - check the current time. If the current time has exceeded some threshold, +// assume that the pod was either never created (the API server never properly +// handled the creation request), or the API server created the pod but we missed +// both the creation and deletion events. In either case, delete the missing pod +// if possible, and mark such a pod to be rescheduled below. +newlyCreatedExecutors.foreach { case (execId, timeCreated) => + if (clock.getTimeMillis() - timeCreated > podCreationTimeout) { +logWarning(s"Executor with id $execId was not detected in the Kubernetes" + + s" cluster after $podCreationTimeout milliseconds despite the fact that a" + + " previous allocation attempt tried to create it. The executor may have been" + + " deleted but the application missed the deletion event.") +Utils.tryLogNonFatalError { + kubernetesClient +.pods() +.withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) +.delete() --- End diff -- Shouldn't deleteFromSpark called here as well? Couldn't be the case that the executor exists at a higher level but K8s backend missed it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] spark issue #19927: [SPARK-22737][ML][WIP] OVR transform optimization
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19927 @mengxr @holdenk How do you think about this? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #91830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91830/testReport)** for PR 21563 at commit [`d6e76e3`](https://github.com/apache/spark/commit/d6e76e3ecf02ea23d6d60aff58f1228f45ba0235). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195349283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- why not just `updateAttribute(r)`? Moreover, in order to avoid the same issue in the future with other cases, have you considered doing something like: ``` updateAttribute(relation.cachedPlan.outputPartitioning) `` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4021/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/132/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21366 @mccheah could you add a design doc for future reference and so that new contributors can understand better the rationale behind this. There is some description in the JIRA ticket but not enough to describe the final solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195348233 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- Not sure why `RangePartitioning` isn't included at first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21564 @cloud-fan @viirya @gatorsmile , could you help review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21564 **[Test build #91829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91829/testReport)** for PR 21564 at commit [`f37139b`](https://github.com/apache/spark/commit/f37139b2d07497af9df1984e5fb7a50931efbf9a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/21564 [SPARK-24556][SQL] ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning ## What changes were proposed in this pull request? Currently, ReusedExchange would rewrite output partitioning if child's partitioning is HashPartitioning, but it does not do the same when child's partitioning is RangePartitioning, sometimes, it could introduce extra shuffle, see: ``` val df = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("i", "j") val df1 = df.as("t1") val df2 = df.as("t2") val t = df1.orderBy("j").join(df2.orderBy("j"), $"t1.i" === $"t2.i", "right") t.cache.orderBy($"t2.j").explain ``` Before: ``` == Physical Plan == *(1) Sort [j#14 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(j#14 ASC NULLS FIRST, 200) +- InMemoryTableScan [i#5, j#6, i#13, j#14] +- InMemoryRelation [i#5, j#6, i#13, j#14], CachedRDDBuilder... +- *(2) BroadcastHashJoin [i#5], [i#13], RightOuter, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as... : +- *(1) Sort [j#6 ASC NULLS FIRST], true, 0 : +- Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) :+- LocalTableScan [i#5, j#6] +- *(2) Sort [j#14 ASC NULLS FIRST], true, 0 +- ReusedExchange [i#13, j#14], Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) ``` Better plan should avoid ```Exchange rangepartitioning(j#14 ASC NULLS FIRST, 200)```, like: ``` == Physical Plan == *(1) Sort [j#14 ASC NULLS FIRST], true, 0 +- InMemoryTableScan [i#5, j#6, i#13, j#14] +- InMemoryRelation [i#5, j#6, i#13, j#14], CachedRDDBuilder... +- *(2) BroadcastHashJoin [i#5], [i#13], RightOuter, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) : +- *(1) Sort [j#6 ASC NULLS FIRST], true, 0 : +- Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) :+- LocalTableScan [i#5, j#6] +- *(2) Sort [j#14 ASC NULLS FIRST], true, 0 +- ReusedExchange [i#13, j#14], Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) ``` ## How was this patch tested? Add new tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yucai/spark SPARK-24556 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21564.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21564 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91828/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #91828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91828/testReport)** for PR 21563 at commit [`b126bd4`](https://github.com/apache/spark/commit/b126bd4f410ab4a01bbe7a980042704ea7420c6f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21321: [SPARK-24268][SQL] Use datatype.simpleString in e...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21321#discussion_r195344259 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -208,7 +208,8 @@ class FeatureHasher(@Since("2.3.0") override val uid: String) extends Transforme require(dataType.isInstanceOf[NumericType] || dataType.isInstanceOf[StringType] || dataType.isInstanceOf[BooleanType], -s"FeatureHasher requires columns to be of NumericType, BooleanType or StringType. " + +s"FeatureHasher requires columns to be of ${NumericType.simpleString}, " + --- End diff -- I think this PR rewrites always constant type referenced. I am not sure why you are saying it is not. If I missed some places, then it was just because I haven't seen them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/131/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4020/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91825/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21562 **[Test build #91825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91825/testReport)** for PR 21562 at commit [`1945ff4`](https://github.com/apache/spark/commit/1945ff4adf3423b324c02e7b7f799cb137a385fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #91828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91828/testReport)** for PR 21563 at commit [`b126bd4`](https://github.com/apache/spark/commit/b126bd4f410ab4a01bbe7a980042704ea7420c6f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21563: [SPARK-24557][ML] ClusteringEvaluator support arr...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/21563 [SPARK-24557][ML] ClusteringEvaluator support array input ## What changes were proposed in this pull request? ClusteringEvaluator support array input ## How was this patch tested? added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark clu_eval_support_array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21563.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21563 commit b126bd4f410ab4a01bbe7a980042704ea7420c6f Author: éçå³° Date: 2018-06-14T08:15:43Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4019/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21547 **[Test build #91827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91827/testReport)** for PR 21547 at commit [`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/130/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20629#discussion_r195340437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -64,12 +65,12 @@ class ClusteringEvaluator @Since("2.3.0") (@Since("2.3.0") override val uid: Str /** * param for metric name in evaluation - * (supports `"silhouette"` (default)) + * (supports `"silhouette"` (default), `"kmeansCost"`) --- End diff -- ok, I agree. Let's go on this way then, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21547 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #91823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91823/testReport)** for PR 21561 at commit [`61b95a3`](https://github.com/apache/spark/commit/61b95a35ecea4ae21e95fb8370bc4b4525370435). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91823/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4018/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/129/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21529 **[Test build #91826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91826/testReport)** for PR 21529 at commit [`6ef4f0d`](https://github.com/apache/spark/commit/6ef4f0df7590f0da5aa900f29292ec0fe94658fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195333092 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- I think we could add a class `Variable` which `GlobalVariable` and `LocalVariable` inherit from having a `declare` method taking an optional parameter `initialValue` so we can just invoke it to declare each variable. But maybe this can also be a followup. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195328074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -720,31 +719,36 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) ctx.addNewFunction(funcName, --- End diff -- nit: probably we can return this as `inline`, so we don't have to put it everywhere we use it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195328479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -155,6 +170,17 @@ object Block { val CODE_BLOCK_BUFFER_LENGTH: Int = 512 + /** + * A custom string interpolator which inlines all types of input arguments into a string without --- End diff -- nit: maybe this comment is better to be put to the `InlineBlock` class, do you agree? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r195331403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -805,43 +811,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.timestampToString($c, $tz));""" case ArrayType(et, _) => (c, evPrim, evNull) => { - val buffer = ctx.freshName("buffer") - val bufferClass = classOf[UTF8StringBuilder].getName + val buffer = ctx.freshVariable("buffer", classOf[UTF8StringBuilder]) + val bufferClass = JavaCode.className(classOf[UTF8StringBuilder]) --- End diff -- Now, each variable defined by `freshVariable` has a type. We can get a type or its class name from the variable (e.g. `bufffer`). Therefore, it looks redundant to declare a name of each variable again (e.g. bufferClass). Do we have an API to get a type of the variable or define an API to get a name of the class? This is because this pattern is very common. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4017/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21562 **[Test build #91825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91825/testReport)** for PR 21562 at commit [`1945ff4`](https://github.com/apache/spark/commit/1945ff4adf3423b324c02e7b7f799cb137a385fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/128/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21562: [Trivial][ML] GMM unpersist RDD after training
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21562 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21562: [Trivial][ML] GMM unpersist RDD after training
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/21562 [Trivial][ML] GMM unpersist RDD after training ## What changes were proposed in this pull request? unpersist `instances` after training ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark gmm_unpersist Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21562.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21562 commit 1945ff4adf3423b324c02e7b7f799cb137a385fb Author: éçå³° Date: 2018-06-14T03:46:13Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18154: [SPARK-20932][ML]CountVectorizer support handle p...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/18154 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18154: [SPARK-20932][ML]CountVectorizer support handle persiste...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18154 This PR is out of date. I will close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20164: [SPARK-22971][ML] OneVsRestModel should use tempo...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/20164 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/20164 This pr is out of date. So I will close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4016/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21109 **[Test build #91824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91824/testReport)** for PR 21109 at commit [`82c194a`](https://github.com/apache/spark/commit/82c194a8a03b6cc028de303fbc07c68d6078cc2b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #91823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91823/testReport)** for PR 21561 at commit [`61b95a3`](https://github.com/apache/spark/commit/61b95a35ecea4ae21e95fb8370bc4b4525370435). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/127/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/21561 [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/NB ## What changes were proposed in this pull request? logNumExamples in KMeans/BiKM/GMM/AFT/NB ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark alg_logNumExamples Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21561 commit ec2171d77456554961558028c654293bea159cc7 Author: éçå³° Date: 2018-06-14T06:15:27Z init pr commit 6ec59d2c2f61ebf05136660388b6887c9d452aca Author: éçå³° Date: 2018-06-14T06:50:42Z add bikm commit 61b95a35ecea4ae21e95fb8370bc4b4525370435 Author: éçå³° Date: 2018-06-14T07:00:12Z _ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91812/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91811/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91818/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91813/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91816/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #91814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91814/testReport)** for PR 21379 at commit [`a3be215`](https://github.com/apache/spark/commit/a3be215755f00100be0817b2a59f1ea8a185518b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91817/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21560 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21531: [SPARK-24521][SQL][TEST] Fix ineffective test in CachedT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21531 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91819/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler options...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20641 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91820/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21531: [SPARK-24521][SQL][TEST] Fix ineffective test in CachedT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21531 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91821/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org