[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15268935
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, actorSystem: A
 {
   // Use an atomic variable to track total number of cores in the cluster 
for simplicity and speed
   var totalCoreCount = new AtomicInteger(0)
-  var totalExpectedExecutors = new AtomicInteger(0)
+  var totalExecutors = new AtomicInteger(0)
+  var totalExpectedResources = new AtomicInteger(0)
   val conf = scheduler.sc.conf
   private val timeout = AkkaUtils.askTimeout(conf)
   private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)
-  // Submit tasks only after (registered executors / total expected 
executors) 
+  // Submit tasks only after (registered resources / total expected 
resources) 
   // is equal to at least this value, that is double between 0 and 1.
-  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0)
+  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0)
   if (minRegisteredRatio > 1) minRegisteredRatio = 1
-  // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after 
the time(milliseconds).
+  // Whatever minRegisteredRatio is arrived, submit tasks after the 
time(milliseconds).
--- End diff --

Ah, I see -- sorry.  Looks like this is what we want? `// Submit tasks 
after maxRegisteredWaitingTime milliseconds if minRegisteredRatio has not yet 
been reached`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread li-zhihui
Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15268932
  
--- Diff: docs/configuration.md ---
@@ -707,21 +707,22 @@ Apart from these, the following properties are also 
available, and may be useful
   
 
 
-  spark.scheduler.minRegisteredExecutorsRatio
+  spark.scheduler.minRegisteredResourcesRatio
   0
   
-The minimum ratio of registered executors (registered executors / 
total expected executors)
+The minimum ratio of registered resources (registered resources / 
total expected resources)
+(resources are executors in yarn mode, CPU cores in standalone and 
mesos mode)
--- End diff --

Thanks @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread li-zhihui
Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15268755
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend(
 logInfo("Executor %s removed: %s".format(fullId, message))
 removeExecutor(fullId.split("/")(1), reason.toString)
   }
+
+  override def checkRegisteredResources(): Boolean = {
--- End diff --

good, thanks @markhamstra 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread li-zhihui
Github user li-zhihui commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15268735
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, actorSystem: A
 {
   // Use an atomic variable to track total number of cores in the cluster 
for simplicity and speed
   var totalCoreCount = new AtomicInteger(0)
-  var totalExpectedExecutors = new AtomicInteger(0)
+  var totalExecutors = new AtomicInteger(0)
+  var totalExpectedResources = new AtomicInteger(0)
   val conf = scheduler.sc.conf
   private val timeout = AkkaUtils.askTimeout(conf)
   private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)
-  // Submit tasks only after (registered executors / total expected 
executors) 
+  // Submit tasks only after (registered resources / total expected 
resources) 
   // is equal to at least this value, that is double between 0 and 1.
-  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0)
+  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0)
   if (minRegisteredRatio > 1) minRegisteredRatio = 1
-  // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after 
the time(milliseconds).
+  // Whatever minRegisteredRatio is arrived, submit tasks after the 
time(milliseconds).
--- End diff --

Thanks @markhamstra , but I think the code means that submit tasks time if 
minRegisteredRatio is not reached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1525#issuecomment-49825654
  
QA tests have started for PR 1525. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17006/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1525#issuecomment-49825514
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1525#issuecomment-49825479
  
can you please also file a jira for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15268342
  
--- Diff: docs/configuration.md ---
@@ -707,21 +707,22 @@ Apart from these, the following properties are also 
available, and may be useful
   
 
 
-  spark.scheduler.minRegisteredExecutorsRatio
+  spark.scheduler.minRegisteredResourcesRatio
   0
   
-The minimum ratio of registered executors (registered executors / 
total expected executors)
+The minimum ratio of registered resources (registered resources / 
total expected resources)
+(resources are executors in yarn mode, CPU cores in standalone and 
mesos mode)
--- End diff --

nit, but mesos isn't covered yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15242315
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend(
 logInfo("Executor %s removed: %s".format(fullId, message))
 removeExecutor(fullId.split("/")(1), reason.toString)
   }
+
+  override def checkRegisteredResources(): Boolean = {
--- End diff --

or `sufficientResourcesRegistered`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15242266
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend(
 logInfo("Executor %s removed: %s".format(fullId, message))
 removeExecutor(fullId.split("/")(1), reason.toString)
   }
+
+  override def checkRegisteredResources(): Boolean = {
--- End diff --

I'd prefer the name to indicate what condition is being checked, so 
something like `sufficientRegisteredResources`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1525#discussion_r15240513
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, actorSystem: A
 {
   // Use an atomic variable to track total number of cores in the cluster 
for simplicity and speed
   var totalCoreCount = new AtomicInteger(0)
-  var totalExpectedExecutors = new AtomicInteger(0)
+  var totalExecutors = new AtomicInteger(0)
+  var totalExpectedResources = new AtomicInteger(0)
   val conf = scheduler.sc.conf
   private val timeout = AkkaUtils.askTimeout(conf)
   private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)
-  // Submit tasks only after (registered executors / total expected 
executors) 
+  // Submit tasks only after (registered resources / total expected 
resources) 
   // is equal to at least this value, that is double between 0 and 1.
-  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0)
+  var minRegisteredRatio = 
conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0)
   if (minRegisteredRatio > 1) minRegisteredRatio = 1
-  // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after 
the time(milliseconds).
+  // Whatever minRegisteredRatio is arrived, submit tasks after the 
time(milliseconds).
--- End diff --

// Submit tasks time(milliseconds) after minRegisteredRatio is reached


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread li-zhihui
Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/1525#issuecomment-49714878
  
@kayousterhout @tgravescs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1525#issuecomment-49714817
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...

2014-07-22 Thread li-zhihui
GitHub user li-zhihui opened a pull request:

https://github.com/apache/spark/pull/1525

Fix race condition at SchedulerBackend.isReady in standalone mode

In SPARK-1946(PR #900), configuration 
spark.scheduler.minRegisteredExecutorsRatio was introduced. 
However, in standalone mode, there is a race condition where isReady() can 
return true because totalExpectedExecutors has not been correctly set.

Because expected executors is uncertain in standalone mode, the PR try to 
use CPU cores(--total-executor-cores) as expected resources to 
judge whether SchedulerBackend is ready.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/li-zhihui/spark fixre4s

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1525


commit 8b54316c77d086ea3454419ebba92003707bbd76
Author: li-zhihui 
Date:   2014-07-22T08:15:40Z

Fix race condition at SchedulerBackend.isReady in standalone mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---