[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-25 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221727020
  
Thanks a lot for merging this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12060


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-25 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221705604
  
I merged this into master (not 2.0, since it's a performance problem rather 
than a correctness problem, and this isn't a regression).  Thanks @ueshin!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221119609
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59153/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221119608
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221119452
  
**[Test build #59153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59153/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221094081
  
**[Test build #59153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59153/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221092911
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221086278
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221086282
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59133/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221085961
  
**[Test build #59133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59133/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-221021338
  
**[Test build #59133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59133/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608365
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58659/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608257
  
**[Test build #58659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58659/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219594426
  
**[Test build #58659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58659/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219594429
  
@kayousterhout Thank you for your review.
I updated the comments and pushed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219582446
  
This LGTM with the small comment changes I suggested. @markhamstra any 
objections to this? Mark / @rxin thoughts on merging it into the 2.0 branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443356
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -752,23 +751,20 @@ class DAGScheduler(
 submitStage(stage)
   }
 }
-submitWaitingStages()
   }
 
   /**
* Check for waiting stages which are now eligible for resubmission.
-   * Ordinarily run on every iteration of the event loop.
+   * Ordinarily run after the parent stage completed successfully.
*/
-  private def submitWaitingStages() {
-// TODO: We might want to run this less often, when we are sure that 
something has become
-// runnable that wasn't before.
+  private def submitWaitingChildStages(parent: Stage) {
 logTrace("Checking for newly runnable parent stages")
--- End diff --

Can you update this to say s"Checking if any dependencies of $parent are 
now runnable"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443272
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -752,23 +751,20 @@ class DAGScheduler(
 submitStage(stage)
   }
 }
-submitWaitingStages()
   }
 
   /**
* Check for waiting stages which are now eligible for resubmission.
-   * Ordinarily run on every iteration of the event loop.
+   * Ordinarily run after the parent stage completed successfully.
--- End diff --

Can you update this to say something like:

"Submits stages that depend on the given parent stage. Called when the 
parent stage completes successfully."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443044
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1357,7 +1345,6 @@ class DAGScheduler(
   logDebug("Additional executor lost message for " + execId +
"(epoch " + currentEpoch + ")")
 }
--- End diff --

This appears to be a non-issue, because we handle lost shuffle output 
separately, when we get a FetchFailure from a task that tries to fetch the 
output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63442702
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1357,7 +1345,6 @@ class DAGScheduler(
   logDebug("Additional executor lost message for " + execId +
"(epoch " + currentEpoch + ")")
 }
--- End diff --

Is it necessary to submit some newly-waiting stages here (e.g., if shuffle 
output was lost for a map stage, so now that map stage needs to be re-run)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread zzcclp
Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218947787
  
@markhamstra , thanks for your explaintion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218945792
  
@zzcclp Not likely.  This PR shouldn't produce any different results, but 
rather produces the same results faster.  We're typically very conservative 
with patch-level releases, so the optimization work for this PR will almost 
certainly only appear in the Spark 2.x series.  That's not too far off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread zzcclp
Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218942375
  
Good PR, will it plan to be merged into branch-1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218935031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58529/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218935030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218934898
  
**[Test build #58529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58529/consoleFull)**
 for PR 12060 at commit 
[`0c0d9ed`](https://github.com/apache/spark/commit/0c0d9edaac8fb6e012232ba99efb9198a1aa556d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218922089
  
**[Test build #58529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58529/consoleFull)**
 for PR 12060 at commit 
[`0c0d9ed`](https://github.com/apache/spark/commit/0c0d9edaac8fb6e012232ba99efb9198a1aa556d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203734186
  
@maropu I see. I'll close #11720.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203727839
  
**[Test build #54582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54582/consoleFull)**
 for PR 12060 at commit 
[`f1407c0`](https://github.com/apache/spark/commit/f1407c0bb302355f7f06aad9ece00541063bde6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r57995458
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1247,7 +1252,7 @@ class DAGScheduler(
 }
   }
 
-  // Note: newly runnable stages will be submitted below when 
we submit waiting stages
+  submitWaitingChildStages(shuffleStage)
--- End diff --

@markhamstra Thank you for your review.
Definitely we can move this into else branch.
I'll modify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r57948394
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1247,7 +1252,7 @@ class DAGScheduler(
 }
   }
 
-  // Note: newly runnable stages will be submitted below when 
we submit waiting stages
+  submitWaitingChildStages(shuffleStage)
--- End diff --

Should this be done when !shuffleStage.isAvailable and we have resubmitted 
the shuffleStage, or only within the else branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r57948177
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1252,7 +1252,7 @@ class DAGScheduler(
 }
   }
 
-  submitWaitingStages()
+  submitWaitingChildStages(shuffleStage)
--- End diff --

Should this be done when `!shuffleStage.isAvailable` and we have 
resubmitted the `shuffleStage`, or only within the `else` branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203510443
  
If so, it'd be better to close #11720; the benefit of correct stage graphs 
fixed in #11720 seems to be clear in this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203507196
  
Yes, that's right.
#11720 is needed to find child stages correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203474530
  
Seems that this optimization must need correct stage graphs fixed in 
#11720, is this right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203396173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54509/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203396172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203395952
  
**[Test build #54509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54509/consoleFull)**
 for PR 12060 at commit 
[`a304235`](https://github.com/apache/spark/commit/a304235c4b086469aa5b5ac8a7d2f0d25addc86f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203349685
  
This PR is based on #11720, so please check it first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-203349785
  
**[Test build #54509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54509/consoleFull)**
 for PR 12060 at commit 
[`a304235`](https://github.com/apache/spark/commit/a304235c4b086469aa5b5ac8a7d2f0d25addc86f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-03-30 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/12060

[SPARK-14269][SCHEDULER] Eliminate unnecessary submitStage() call.

## What changes were proposed in this pull request?

Currently a method `submitStage()` for waiting stages is called on every 
iteration of the event loop in `DAGScheduler` to submit all waiting stages, but 
most of them are not necessary because they are not related to Stage status.
The case we should try to submit waiting stages is only when their parent 
stages are successfully completed.

This elimination can improve `DAGScheduler` performance.

## How was this patch tested?

Added some checks and other existing tests, and our projects.

We have a project bottle-necked by `DAGScheduler`, having about 2000 stages.

Before this patch the almost all execution time in `Driver` process was 
spent to process `submitStage()` of `dag-scheduler-event-loop` thread but after 
this patch the performance was improved as follows:


|| total execution time | `dag-scheduler-event-loop` thread time | 
`submitStage()` |

||-:|---:|:|
| Before |  760 sec |710 sec |  
   667 sec |
| After  |  440 sec | 14 sec |  
10 sec |



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-14269

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12060


commit 9a1724de0287b5ca41e30f3d3401fd721a2e1520
Author: Takuya UESHIN 
Date:   2016-03-15T02:21:09Z

Add a test to check if the stage graph is properly built.

commit f8b7910ecb52a5954de091ed79d5de9c19ba2744
Author: Takuya UESHIN 
Date:   2016-03-15T02:22:42Z

Make DAGScheduler.getAncestorShuffleDependencies() return in topological 
order to ensure building ancestor stages first.

commit 0ea3fc838f689729794b6ea3aaf0b88a339ec20c
Author: Takuya UESHIN 
Date:   2016-03-16T02:04:45Z

Refactor getAncestorShuffleDependencies.

commit 697b32208262b3c1c10bc2cae43b891c7970811d
Author: Takuya UESHIN 
Date:   2016-03-16T12:55:50Z

Fix topological sort.

commit d6d3c34e0e8387ce6390babba3df2464a8b2b4a1
Author: Takuya UESHIN 
Date:   2016-03-17T12:21:32Z

Merge branch 'master' into issues/SPARK-13902

commit 1636531c65912bbfb68e4c669690a9f9107d9cd1
Author: Takuya UESHIN 
Date:   2016-03-28T07:01:27Z

Add assertion to check not to overwrite illegally.

commit 92e9f4484b09f65829f6e9300042cc2b57979278
Author: Takuya UESHIN 
Date:   2016-03-28T07:19:09Z

Modify to mitigate adds extra push&pop.

commit 4b412f5e73ca9cf5ab2de1a51f6c30f01286e89a
Author: Takuya UESHIN 
Date:   2016-03-28T07:48:42Z

Modify comment.

commit 8fb9a149a03543a35c2a08c79edc53d49f66b5c2
Author: Takuya UESHIN 
Date:   2016-03-28T08:11:37Z

Add a comment to explain what the test is doing.

commit e2cfeaf3ef5a7291a235bbcbb968d88959e52e93
Author: Takuya UESHIN 
Date:   2016-03-29T03:22:36Z

Revert "Add assertion to check not to overwrite illegally."

This reverts commit 1636531c65912bbfb68e4c669690a9f9107d9cd1.

commit e3c0de33290aaccdd826d5ca38b87ace73a01fb5
Author: Takuya UESHIN 
Date:   2016-03-11T05:45:24Z

Eliminate unnecessary `submitWaitingStages()` call.

commit b73eaac805dca779fd7635f63fdd12c78e634509
Author: Takuya UESHIN 
Date:   2016-03-30T09:29:33Z

Merge branch 'issues/SPARK-13902' into issues/SPARK-14269

commit 88c4bc1dd1c36b456432de2c895054799ff97a20
Author: Takuya UESHIN 
Date:   2016-03-30T08:47:20Z

Add some checks.

commit a304235c4b086469aa5b5ac8a7d2f0d25addc86f
Author: Takuya UESHIN 
Date:   2016-03-15T03:19:03Z

Try to submit only child stages of the completed stage.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org