[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8831


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141748157
  
Thanks - I'm going to merge this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141741093
  
  [Test build #1777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1777/console)
 for   PR 8831 at commit 
[`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class Interaction(override val uid: String) extends Transformer`
  * `  final val probabilityCol: Param[String] = new Param[String](this, 
"probabilityCol", "Column name for predicted class conditional probabilities. 
Note: Not all models output well-calibrated probability estimates! These 
probabilities should be treated as confidences, not precise probabilities")`
  * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override 
val uid: String)`
  * `  require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint 
must be 1.0 or 0.0")`
  * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] 
with Logging `
  * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141723775
  
  [Test build #1777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1777/consoleFull)
 for   PR 8831 at commit 
[`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141723442
  
Jenkins, retest this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141635017
  
LGTM - only comment is maybe we should warn in SparkConf for the core 
settings. But I'm ok with merging this as is (provided that tests pass either 
on Jenkins or locally).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8831#discussion_r39917683
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleManager.scala ---
@@ -24,7 +24,13 @@ import org.apache.spark.shuffle._
  * A ShuffleManager using hashing, that creates one output file per reduce 
partition on each
  * mapper (possibly reusing these across waves of tasks).
  */
-private[spark] class HashShuffleManager(conf: SparkConf) extends 
ShuffleManager {
+private[spark] class HashShuffleManager(conf: SparkConf) extends 
ShuffleManager with Logging {
+
+  if (!conf.getBoolean("spark.shuffle.spill", true)) {
--- End diff --

how about adding this to sparkconf itself, and don't have these here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141601050
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141601051
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141601028
  
  [Test build #42698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/console)
 for   PR 8831 at commit 
[`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class Interaction(override val uid: String) extends Transformer`
  * `  final val probabilityCol: Param[String] = new Param[String](this, 
"probabilityCol", "Column name for predicted class conditional probabilities. 
Note: Not all models output well-calibrated probability estimates! These 
probabilities should be treated as confidences, not precise probabilities")`
  * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override 
val uid: String)`
  * `  require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint 
must be 1.0 or 0.0")`
  * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] 
with Logging `
  * `case class Sort(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141584039
  
  [Test build #42698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42698/consoleFull)
 for   PR 8831 at commit 
[`bb053c7`](https://github.com/apache/spark/commit/bb053c780438848cd7fa02ab3dfb0fece1afe866).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141583377
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141583355
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8831#discussion_r39907246
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala 
---
@@ -31,38 +31,12 @@ import org.apache.spark.{SparkEnv, InternalAccumulator, 
TaskContext}
 // This file defines various sort operators.
 

 
-
-/**
- * Performs a sort on-heap.
- * @param global when true performs a global sort of all partitions by 
shuffling the data first
- *   if necessary.
- */
-case class Sort(
-sortOrder: Seq[SortOrder],
-global: Boolean,
-child: SparkPlan)
-  extends UnaryNode {
-  override def requiredChildDistribution: Seq[Distribution] =
-if (global) OrderedDistribution(sortOrder) :: Nil else 
UnspecifiedDistribution :: Nil
-
-  protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"sort") {
-child.execute().mapPartitions( { iterator =>
-  val ordering = newOrdering(sortOrder, child.output)
-  iterator.map(_.copy()).toArray.sorted(ordering).iterator
-}, preservesPartitioning = true)
-  }
-
-  override def output: Seq[Attribute] = child.output
-
-  override def outputOrdering: Seq[SortOrder] = sortOrder
-}
-
 /**
  * Performs a sort, spilling to disk as needed.
  * @param global when true performs a global sort of all partitions by 
shuffling the data first
  *   if necessary.
  */
-case class ExternalSort(
+case class Sort(
--- End diff --

Here, I just renamed `ExternalSort` to `Sort` and deleted the old in-memory 
`Sort`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8831#issuecomment-141583145
  
/cc @rxin, @marmbrus, and @davies for review and sign-off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10710] Remove ability to disable spilli...

2015-09-18 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/8831

[SPARK-10710] Remove ability to disable spilling in core and SQL

It does not make much sense to set `spark.shuffle.spill` or 
`spark.sql.planner.externalSort` to false: I believe that these configurations 
were initially added as "escape hatches" to guard against bugs in the external 
operators, but these operators are now mature and well-tested. In addition, 
these configurations are not handled in a consistent way anymore: SQL's 
Tungsten codepath ignores these configurations and will continue to use 
spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager 
does not respect `spark.shuffle.spill=false`.

This pull request removes these configurations, adds warnings at the 
appropriate places, and deletes a large amount of code which was only used in 
code paths that did not support spilling.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark 
remove-ability-to-disable-spilling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8831.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8831


commit d81ef04e98565ac3fe6a97e97df7ac95fe4895a6
Author: Josh Rosen 
Date:   2015-09-18T21:55:13Z

Remove ability to set spark.shuffle.spill=false.

commit 4bce5f2e8e90b5c2e953f057b16f7ccc64df52a0
Author: Josh Rosen 
Date:   2015-09-18T22:15:15Z

Remove ability to set spark.sql.planner.externalSort=false.

commit bb053c780438848cd7fa02ab3dfb0fece1afe866
Author: Josh Rosen 
Date:   2015-09-18T22:19:54Z

Make similar changes in PySpark.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org