[GitHub] spark pull request #17741: SNAP-1420
Github user hbhanawat closed the pull request at: https://github.com/apache/spark/pull/17741 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17741: SNAP-1420
GitHub user hbhanawat opened a pull request: https://github.com/apache/spark/pull/17741 SNAP-1420 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SnappyDataInc/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17741.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17741 commit 35d832d42643f0bcfa8a775587841ce5c537ea5b Author: Vivek Bhaskar Date: 2016-11-25T09:43:36Z Helper classes for DataSerializable implementation. commit d4e1c7044ced8c66c257c14976569dc6661fcf5f Author: Vivek Bhaskar Date: 2016-11-29T09:06:15Z Revert "Helper classes for DataSerializable implementation." This reverts commit 35d832d42643f0bcfa8a775587841ce5c537ea5b. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14202: [SPARK-16230] [CORE] CoarseGrainedExecutorBackend...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/14202#discussion_r71058847 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -147,7 +148,10 @@ private[spark] class CoarseGrainedExecutorBackend( * executor exits differently. For e.g. when an executor goes down, * back-end may not want to take the parent process down. */ - protected def exitExecutor(code: Int): Unit = System.exit(code) + protected def exitExecutor(code: Int, reason: String, throwable: Throwable = null) = { --- End diff -- Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-215310532 Hmm. @vanzin I think you have a point. There are few things that can be done but not sure if they will simplify without reducing the flexibility. I will think more on it and get back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-215069609 @rxin @vanzin Can we merge this now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-215043226 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-215009140 Build again failed with some unrelated sporadic error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-215009160 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-214969001 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-214606466 @vanzin @rxin Thanks for commenting. Incorporated review comments apart from the masterURL comment. Regarding the masterURL being part of API, I think the scheduler and backend creation may depend on the masterURL and hence it will be better if it is part of the API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-214129927 Looks like the failing test is related to this JIRA: SPARK-13693. I have reopened it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-214023638 @rxin , your comments made sense and I have made the respective changes. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12641#issuecomment-213805039 @rxin Please take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14729][Scheduler] Refactored YARN sched...
GitHub user hbhanawat opened a pull request: https://github.com/apache/spark/pull/12641 [SPARK-14729][Scheduler] Refactored YARN scheduler creation code to use newly added ExternalClusterManager ## What changes were proposed in this pull request? With the addition of ExternalClusterManager(ECM) interface in PR #11723, any cluster manager can now be integrated with Spark. It was suggested in ExternalClusterManager PR that one of the existing cluster managers should start using the new interface to ensure that the API is correct. Ideally, all the existing cluster managers should eventually use the ECM interface but as a first step yarn will now use the ECM interface. This PR refactors YARN code from SparkContext.createTaskScheduler function into YarnClusterManager that implements ECM interface. ## How was this patch tested? Since this is refactoring, no new tests has been added. Existing tests have been run. Basic manual testing with YARN was done too. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hbhanawat/spark yarnClusterMgr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12641.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12641 commit 643f8d2686e260053c71ded68e46227d2d82aba9 Author: Hemant Bhanawat Date: 2016-04-23T18:46:11Z With the addition of ExternalClusterManager(ECM) interface in PR #11723, any cluster manager can now be integrated with Spark. It was suggested in ExternalClusterManager PR that one of the existing cluster managers should start using the new interface to ensure that the API is correct. Ideally, all the existing cluster managers should eventually use the ECM interface but as a first step yarn will now use the ECM interface. This PR refactors YARN code from SparkContext.createTaskScheduler function into YarnClusterManager that implements ECM interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904] Add exit code parameter to exitE...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/12457#issuecomment-211860555 Looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-210971481 @rxin I will open another JIRA and a PR to do this. Thanks for the review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-210861458 @rxin how do I get this retested by Jenkins? There were few issues going on with the Jenkins when I checked in my last changes and now it is not retesting it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-210783316 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-208235009 @rxin can you please review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14042][CORE] Add custom coalescer suppo...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11865#issuecomment-205145273 @nezihyigitbasi, do you plan to add something similar for DF/DS API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-202692781 @tejasapatil if you are done with the review, can we ask @rxin to have a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-200961545 @rxin @tejasapatil the previous build failure is not related to my checkin and looks like some other issue as other build requests failed with the same exception. In case, it fails again, how do I restart it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57365175 --- Diff: core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.scheduler.SchedulingMode.SchedulingMode +import org.apache.spark.storage.BlockManagerId + +class ExternalClusterManagerSuite extends SparkFunSuite with LocalSparkContext +{ + test("launch of backend and scheduler") { +val conf = new SparkConf().setMaster("myclusterManager"). +setAppName("testcm").set("spark.driver.allowMultipleContexts", "true") +sc = new SparkContext(conf) +// check if the scheduler components are created +assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend]) +assert(sc.taskScheduler.isInstanceOf[DummyTaskScheduler]) + } +} + +class DummyExternalClusterManager extends ExternalClusterManager { + + def canCreate(masterURL: String): Boolean = masterURL == "myclusterManager" + + def createTaskScheduler(sc: SparkContext, + masterURL: String): TaskScheduler = new DummyTaskScheduler + + def createSchedulerBackend(sc: SparkContext, + masterURL: String, + scheduler: TaskScheduler): SchedulerBackend = +new FakeSchedulerBackend() + + def initialize(scheduler: TaskScheduler, backend: SchedulerBackend): Unit = {} + +} + +class DummyTaskScheduler extends TaskScheduler { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57365126 --- Diff: core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.scheduler.SchedulingMode.SchedulingMode +import org.apache.spark.storage.BlockManagerId + +class ExternalClusterManagerSuite extends SparkFunSuite with LocalSparkContext +{ + test("launch of backend and scheduler") { +val conf = new SparkConf().setMaster("myclusterManager"). +setAppName("testcm").set("spark.driver.allowMultipleContexts", "true") +sc = new SparkContext(conf) +// check if the scheduler components are created +assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend]) --- End diff -- I too missed it completely. I think it wasn't a great idea in the first place to use FakeSchedulerBackend of some other class from maintenance perspective. I am going ahead with your option 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57365152 --- Diff: core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.scheduler.SchedulingMode.SchedulingMode +import org.apache.spark.storage.BlockManagerId + +class ExternalClusterManagerSuite extends SparkFunSuite with LocalSparkContext +{ + test("launch of backend and scheduler") { +val conf = new SparkConf().setMaster("myclusterManager"). +setAppName("testcm").set("spark.driver.allowMultipleContexts", "true") +sc = new SparkContext(conf) +// check if the scheduler components are created +assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend]) +assert(sc.taskScheduler.isInstanceOf[DummyTaskScheduler]) + } +} + +class DummyExternalClusterManager extends ExternalClusterManager { --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57364628 --- Diff: dev/.rat-excludes --- @@ -98,3 +98,4 @@ LZ4BlockInputStream.java spark-deps-.* .*csv .*tsv +org.apache.spark.scheduler.ExternalClusterManager --- End diff -- I believe there is some confusion. There is a file named org.apache.spark.scheduler.ExternalClusterManager in a test folder that is used by the ServiceLoader to launch the dummy external cluster manager. This file does not (and cannot) have a header and hence it needs to be excluded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309581 --- Diff: dev/.rat-excludes --- @@ -98,3 +98,4 @@ LZ4BlockInputStream.java spark-deps-.* .*csv .*tsv +org.apache.spark.scheduler.ExternalClusterManager --- End diff -- Rat throws an error for the files that do not have apache license. Hence excluding this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309531 --- Diff: core/src/test/resources/META-INF/services/org.apache.spark.scheduler.ExternalClusterManager --- @@ -0,0 +1 @@ +org.apache.spark.scheduler.CheckExternalClusterManager --- End diff -- To instantiate the dummy cluster manager in the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309422 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.SparkContext +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * A cluster manager interface to plugin external scheduler. + * + */ +@DeveloperApi +private[spark] trait ExternalClusterManager { + + /** + * Check if this cluster manager instance can create scheduler components + * for a certain master URL. + * @param masterURL the master URL + * @return True if the cluster manager can create scheduler backend/ + */ + def canCreate(masterURL : String): Boolean + + /** + * Create a task scheduler instance for the given SparkContext + * @param sc SparkContext + * @return TaskScheduler that will be responsible for task handling + */ + def createTaskScheduler (sc: SparkContext): TaskScheduler + + /** + * Create a scheduler backend for the given SparkContext and scheduler. This is + * called after task scheduler is created using [[ExternalClusterManager.createTaskScheduler()]]. + * @param sc SparkContext + * @param scheduler TaskScheduler that will be used with the scheduler backend. + * @return SchedulerBackend that works with a TaskScheduler + */ + def createSchedulerBackend (sc: SparkContext, scheduler: TaskScheduler): SchedulerBackend --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309392 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.SparkContext +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * A cluster manager interface to plugin external scheduler. + * + */ +@DeveloperApi +private[spark] trait ExternalClusterManager { + + /** + * Check if this cluster manager instance can create scheduler components + * for a certain master URL. + * @param masterURL the master URL + * @return True if the cluster manager can create scheduler backend/ + */ + def canCreate(masterURL : String): Boolean + + /** + * Create a task scheduler instance for the given SparkContext + * @param sc SparkContext + * @return TaskScheduler that will be responsible for task handling + */ + def createTaskScheduler (sc: SparkContext): TaskScheduler --- End diff -- I have added the master url. In the hindsight, I think it makes sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309305 --- Diff: core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.executor.TaskMetrics +import org.apache.spark.scheduler.SchedulingMode.SchedulingMode +import org.apache.spark.storage.BlockManagerId + +class ExternalClusterManagerSuite extends SparkFunSuite with LocalSparkContext +{ + test("launch of backend and scheduler") { +val conf = new SparkConf().setMaster("myclusterManager"). +setAppName("testcm").set("spark.driver.allowMultipleContexts", "true") +sc = new SparkContext(conf) +// check if the scheduler components are created +assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend]) +assert(sc.taskScheduler.isInstanceOf[FakeScheduler]) + } +} + +class CheckExternalClusterManager extends ExternalClusterManager { --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309258 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -149,7 +149,14 @@ private[spark] class Executor( tr.kill(interruptThread) } } - + def killAllTasks (interruptThread: Boolean) : Unit = { +// kill all the running tasks +for (taskRunner <- runningTasks.values().asScala) { + if (taskRunner != null) { +taskRunner.kill(interruptThread) + } +} + } --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309289 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.SparkContext +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * A cluster manager interface to plugin external scheduler. + * --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57309250 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -149,7 +149,14 @@ private[spark] class Executor( tr.kill(interruptThread) } } - + def killAllTasks (interruptThread: Boolean) : Unit = { --- End diff -- 1. Done 2. We are targeting a use case where in executors are launched inside another running process and when an executor goes down, it does not take the parent parent process down. In such cases, executors should kill the running tasks when they go down. Given that, the runningtasks a private val, we need a method that can be called from the executor backend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on a diff in the pull request: https://github.com/apache/spark/pull/11723#discussion_r57308967 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2443,8 +2443,34 @@ object SparkContext extends Logging { "in the form mesos://zk://host:port. Current Master URL will stop working in Spark 2.0.") createTaskScheduler(sc, "mesos://" + zkUrl, deployMode) - case _ => -throw new SparkException("Could not parse Master URL: '" + master + "'") + case masterUrl => +val cm = getClusterManager(masterUrl) match { + case Some(clusterMgr) => clusterMgr + case None => throw new SparkException("Could not parse Master URL: '" + master + "'") +} +try { + val scheduler = cm.createTaskScheduler(sc) + val backend = cm.createSchedulerBackend(sc, scheduler) + cm.initialize(scheduler, backend) + (backend, scheduler) +} catch { + case e: Exception => { +throw new SparkException("External scheduler cannot be instantiated", e) + } +} +} + } + + private def getClusterManager(url: String): Option[ExternalClusterManager] = { +val loader = Utils.getContextOrSparkClassLoader +val serviceLoader = ServiceLoader.load(classOf[ExternalClusterManager], loader) + +serviceLoader.asScala.filter(_.canCreate(url)).toList match { + // exactly one registered manager + case head :: Nil => Some(head) + case Nil => None + case multipleMgrs => sys.error(s"Multiple Cluster Managers registered " + --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-200329683 @rxin Any update? Any changes needed from my side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-198313907 @rxin I have completed the changes. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-197804928 @rxin ok, I get it. I would make ExternalClusterManager as private[spark] and mark it as developer API. I hope that should suffice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user hbhanawat commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-197221049 @rxin Thanks for commenting. Spark was designed such that it is agnostic to the underlying cluster manager (as long as it can acquire executor processes, and these communicate with each other). Since Spark is now being used in newer and different use cases, there is a need for allowing other cluster managers to manage spark components. One such use case is - embedding spark components like executor and driver inside another process which may be a datastore. This allows co-location of data and processing. Another use case would be using Spark like an application server (you might have heard about spark-jobserver). Spark's current design allows handling such use cases if the cluster manager supports it. Hence, IMO, it is meaningful to allow plugging in new cluster managers. From code perspective, I think that even creation of TaskScheduler and SchedulerBackend for Yarn/Mesos/local mode should be done using a similar interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
GitHub user hbhanawat opened a pull request: https://github.com/apache/spark/pull/11723 [SPARK-13904][Scheduler]Add support for pluggable cluster manager ## What changes were proposed in this pull request? This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down. To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface. Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence, 1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend. 2. Added functionality of killing all the running tasks in an executor. ## How was this patch tested? ExternalClusterManagerSuite.scala was added to test this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hbhanawat/spark pluggableScheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11723 commit 800834f24ad1f0c4a68d8d49f600db6570d100ef Author: Hemant Bhanawat Date: 2016-03-15T09:00:30Z This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down. To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface. Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence, 1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend. 2. Added functionality of killing all the running tasks in an executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org