[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 Actually, @kayousterhout - I'm not entirely sure what you expect for the semantics of maxShares in general. Maybe a worked example would help: if I have a pool X with 5 running tasks from Taskset A and a maxShares of 7. Pool X is a child of pool Y which has a maxShares of 8. I want to the schedule another task from Taskset A, so should the scheduler allow it or not? Do you need to know how many executors are currently running (and so the maximum number of tasks that could be run)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 @kayousterhout minShares is a configuration parameter for the fair scheduler algorithm only - what would the semantics of a maxShares setting for the FIFO algorithm be? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 ping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 Hi @kayousterhout - I've renamed all references to `maxRunningTasks` and updated the Markdown documentation in the repo. Is this OK? Thanks - --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 @squito is this OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools
Github user njwhite commented on the issue: https://github.com/apache/spark/pull/12951 @squito thanks - I've expanded the `Scheduler respects maxRunningTasks setting of its pool` test to cover the cases you mention (and a couple of others). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12951#issuecomment-97641 Thanks @markhamstra! The Jenkins build failed because a single test, `ExternalAppendOnlyMapSuite#spilling with compression`, failed. It seems unrelated (and passes locally for me) - are there known issues with it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12951#issuecomment-222175368 Thanks @squito; I've renamed the setting to `maxRunningTasks` and added the tests you asked for to `TaskSetManagerSuite`. I've also added support (& tests) for configuring the parent pool in the XML config file, as that came in useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12951#issuecomment-220555299 Thanks for the review @squito - I've commented on the JIRA about why this feature would be useful. As for the implementation - maybe "maxShare" is the wrong word, as the change doesn't relate to the fair scheduler at all. Instead it limits the maximum number of tasks a `Schedulable` can have running at any one time. It really is just a one line change - `resourceOffer` now won't accept any more resources (i.e. won't run any of its tasks) if the calculated current value of `maxShares` means there isn't any free space in the pool. The `maxShares` method just returns the maximum number of tasks allowed in the pool, minus the number of tasks running in the pool. You can see the propagation of the `maxShares` limit in the assertions I added to the `PoolSuite` test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12951#issuecomment-217498019 @HyukjinKwon I've run `run-tests` and fixed all the style issues. Could you take another look? Thanks - --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...
GitHub user njwhite opened a pull request: https://github.com/apache/spark/pull/12951 [SPARK-15176][Core] Add maxShares setting to Pools ## What changes were proposed in this pull request? Help guarantee resource availablity by (hierarchically) limiting the amount of tasks a given pool can run. The maximum number of tasks for a given pool can be configured by the allocation XML file, and child pools are limited to at most the number of tasks of their parent. ## How was this patch tested? Unit tests run and new unit tests added for functionality. You can merge this pull request into a Git repository by running: $ git pull https://github.com/njwhite/spark feature/pool Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12951.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12951 commit 1b046be04d18e19be33078e2b9f5e26ac8e6aa67 Author: Nick White <nwh...@palantir.com> Date: 2016-05-06T09:42:18Z [SPARK-15176][Core] Add maxShares setting to Pools Help guarantee resource availablity by (hierarchically) limiting the amount of tasks a given pool can run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
Github user njwhite commented on the pull request: https://github.com/apache/spark/pull/12620#issuecomment-214210492 @davies I'm using this to use the "dill" serializer, as it can pickle more things (and allows more fine-grained control) than the cloud-pickle serializer. What about making that the default for functions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...
GitHub user njwhite opened a pull request: https://github.com/apache/spark/pull/12620 [SPARK-14859][PYSPARK] Make Lambda Serializer Configurable ## What changes were proposed in this pull request? Store the serializer that we should use to serialize RDD transformation functions on the SparkContext, defaulting to a CloudPickleSerializer if not given. Allow a user to change this serializer when first constructing the SparkContext. ## How was this patch tested? Unit tests and manual integration tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/njwhite/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12620 commit 0a3a7c168c6b262671a14c02c16aec3207ce9ee0 Author: Nick White <nwh...@palantir.com> Date: 2016-04-22T20:53:20Z [SPARK-14859][PYSPARK] Make Lambda Serializer Configurable Store the serializer that we should use to serialize RDD transformation functions on the SparkContext, defaulting to a CloudPickleSerializer if not given. Allow a user to change this serializer when first constructing the SparkContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org