[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268935 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A { // Use an atomic variable to track total number of cores in the cluster for simplicity and speed var totalCoreCount = new AtomicInteger(0) - var totalExpectedExecutors = new AtomicInteger(0) + var totalExecutors = new AtomicInteger(0) + var totalExpectedResources = new AtomicInteger(0) val conf = scheduler.sc.conf private val timeout = AkkaUtils.askTimeout(conf) private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf) - // Submit tasks only after (registered executors / total expected executors) + // Submit tasks only after (registered resources / total expected resources) // is equal to at least this value, that is double between 0 and 1. - var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0) + var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0) if (minRegisteredRatio > 1) minRegisteredRatio = 1 - // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the time(milliseconds). + // Whatever minRegisteredRatio is arrived, submit tasks after the time(milliseconds). --- End diff -- Ah, I see -- sorry. Looks like this is what we want? `// Submit tasks after maxRegisteredWaitingTime milliseconds if minRegisteredRatio has not yet been reached` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user li-zhihui commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268932 --- Diff: docs/configuration.md --- @@ -707,21 +707,22 @@ Apart from these, the following properties are also available, and may be useful - spark.scheduler.minRegisteredExecutorsRatio + spark.scheduler.minRegisteredResourcesRatio 0 -The minimum ratio of registered executors (registered executors / total expected executors) +The minimum ratio of registered resources (registered resources / total expected resources) +(resources are executors in yarn mode, CPU cores in standalone and mesos mode) --- End diff -- Thanks @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user li-zhihui commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268755 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend( logInfo("Executor %s removed: %s".format(fullId, message)) removeExecutor(fullId.split("/")(1), reason.toString) } + + override def checkRegisteredResources(): Boolean = { --- End diff -- good, thanks @markhamstra --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user li-zhihui commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268735 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A { // Use an atomic variable to track total number of cores in the cluster for simplicity and speed var totalCoreCount = new AtomicInteger(0) - var totalExpectedExecutors = new AtomicInteger(0) + var totalExecutors = new AtomicInteger(0) + var totalExpectedResources = new AtomicInteger(0) val conf = scheduler.sc.conf private val timeout = AkkaUtils.askTimeout(conf) private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf) - // Submit tasks only after (registered executors / total expected executors) + // Submit tasks only after (registered resources / total expected resources) // is equal to at least this value, that is double between 0 and 1. - var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0) + var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0) if (minRegisteredRatio > 1) minRegisteredRatio = 1 - // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the time(milliseconds). + // Whatever minRegisteredRatio is arrived, submit tasks after the time(milliseconds). --- End diff -- Thanks @markhamstra , but I think the code means that submit tasks time if minRegisteredRatio is not reached. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-49825654 QA tests have started for PR 1525. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17006/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-49825514 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-49825479 can you please also file a jira for this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15268342 --- Diff: docs/configuration.md --- @@ -707,21 +707,22 @@ Apart from these, the following properties are also available, and may be useful - spark.scheduler.minRegisteredExecutorsRatio + spark.scheduler.minRegisteredResourcesRatio 0 -The minimum ratio of registered executors (registered executors / total expected executors) +The minimum ratio of registered resources (registered resources / total expected resources) +(resources are executors in yarn mode, CPU cores in standalone and mesos mode) --- End diff -- nit, but mesos isn't covered yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15242315 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend( logInfo("Executor %s removed: %s".format(fullId, message)) removeExecutor(fullId.split("/")(1), reason.toString) } + + override def checkRegisteredResources(): Boolean = { --- End diff -- or `sufficientResourcesRegistered` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15242266 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -108,4 +108,8 @@ private[spark] class SparkDeploySchedulerBackend( logInfo("Executor %s removed: %s".format(fullId, message)) removeExecutor(fullId.split("/")(1), reason.toString) } + + override def checkRegisteredResources(): Boolean = { --- End diff -- I'd prefer the name to indicate what condition is being checked, so something like `sufficientRegisteredResources`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1525#discussion_r15240513 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -47,19 +47,19 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: A { // Use an atomic variable to track total number of cores in the cluster for simplicity and speed var totalCoreCount = new AtomicInteger(0) - var totalExpectedExecutors = new AtomicInteger(0) + var totalExecutors = new AtomicInteger(0) + var totalExpectedResources = new AtomicInteger(0) val conf = scheduler.sc.conf private val timeout = AkkaUtils.askTimeout(conf) private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf) - // Submit tasks only after (registered executors / total expected executors) + // Submit tasks only after (registered resources / total expected resources) // is equal to at least this value, that is double between 0 and 1. - var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredExecutorsRatio", 0) + var minRegisteredRatio = conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0) if (minRegisteredRatio > 1) minRegisteredRatio = 1 - // Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the time(milliseconds). + // Whatever minRegisteredRatio is arrived, submit tasks after the time(milliseconds). --- End diff -- // Submit tasks time(milliseconds) after minRegisteredRatio is reached --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-49714878 @kayousterhout @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1525#issuecomment-49714817 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fix race condition at SchedulerBackend.isReady...
GitHub user li-zhihui opened a pull request: https://github.com/apache/spark/pull/1525 Fix race condition at SchedulerBackend.isReady in standalone mode In SPARK-1946(PR #900), configuration spark.scheduler.minRegisteredExecutorsRatio was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set. Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(--total-executor-cores) as expected resources to judge whether SchedulerBackend is ready. You can merge this pull request into a Git repository by running: $ git pull https://github.com/li-zhihui/spark fixre4s Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1525.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1525 commit 8b54316c77d086ea3454419ebba92003707bbd76 Author: li-zhihui Date: 2014-07-22T08:15:40Z Fix race condition at SchedulerBackend.isReady in standalone mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---