[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8831 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141748157 Thanks - I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141741093 [Test build #1777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1777/console) for PR 8831 at commit [`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class Interaction(override val uid: String) extends Transformer` * ` final val probabilityCol: Param[String] = new Param[String](this, "probabilityCol", "Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities")` * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: String)` * ` require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint must be 1.0 or 0.0")` * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging ` * `case class Sort(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141723775 [Test build #1777 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1777/consoleFull) for PR 8831 at commit [`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141723442 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141635017 LGTM - only comment is maybe we should warn in SparkConf for the core settings. But I'm ok with merging this as is (provided that tests pass either on Jenkins or locally). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8831#discussion_r39917683 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleManager.scala --- @@ -24,7 +24,13 @@ import org.apache.spark.shuffle._ * A ShuffleManager using hashing, that creates one output file per reduce partition on each * mapper (possibly reusing these across waves of tasks). */ -private[spark] class HashShuffleManager(conf: SparkConf) extends ShuffleManager { +private[spark] class HashShuffleManager(conf: SparkConf) extends ShuffleManager with Logging { + + if (!conf.getBoolean("spark.shuffle.spill", true)) { --- End diff -- how about adding this to sparkconf itself, and don't have these here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141601050 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141601051 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141601028 [Test build #42698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/console) for PR 8831 at commit [`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class Interaction(override val uid: String) extends Transformer` * ` final val probabilityCol: Param[String] = new Param[String](this, "probabilityCol", "Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities")` * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: String)` * ` require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint must be 1.0 or 0.0")` * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging ` * `case class Sort(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141584039 [Test build #42698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/consoleFull) for PR 8831 at commit [`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141583377 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141583355 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8831#discussion_r39907246 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala --- @@ -31,38 +31,12 @@ import org.apache.spark.{SparkEnv, InternalAccumulator, TaskContext} // This file defines various sort operators. - -/** - * Performs a sort on-heap. - * @param global when true performs a global sort of all partitions by shuffling the data first - * if necessary. - */ -case class Sort( -sortOrder: Seq[SortOrder], -global: Boolean, -child: SparkPlan) - extends UnaryNode { - override def requiredChildDistribution: Seq[Distribution] = -if (global) OrderedDistribution(sortOrder) :: Nil else UnspecifiedDistribution :: Nil - - protected override def doExecute(): RDD[InternalRow] = attachTree(this, "sort") { -child.execute().mapPartitions( { iterator => - val ordering = newOrdering(sortOrder, child.output) - iterator.map(_.copy()).toArray.sorted(ordering).iterator -}, preservesPartitioning = true) - } - - override def output: Seq[Attribute] = child.output - - override def outputOrdering: Seq[SortOrder] = sortOrder -} - /** * Performs a sort, spilling to disk as needed. * @param global when true performs a global sort of all partitions by shuffling the data first * if necessary. */ -case class ExternalSort( +case class Sort( --- End diff -- Here, I just renamed `ExternalSort` to `Sort` and deleted the old in-memory `Sort`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8831#issuecomment-141583145 /cc @rxin, @marmbrus, and @davies for review and sign-off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/8831 [SPARK-10710] Remove ability to disable spilling in core and SQL It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark remove-ability-to-disable-spilling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8831 commit d81ef04e98565ac3fe6a97e97df7ac95fe4895a6 Author: Josh Rosen Date: 2015-09-18T21:55:13Z Remove ability to set spark.shuffle.spill=false. commit 4bce5f2e8e90b5c2e953f057b16f7ccc64df52a0 Author: Josh Rosen Date: 2015-09-18T22:15:15Z Remove ability to set spark.sql.planner.externalSort=false. commit bb053c780438848cd7fa02ab3dfb0fece1afe866 Author: Josh Rosen Date: 2015-09-18T22:19:54Z Make similar changes in PySpark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org