[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-10-13 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
Actually, @kayousterhout - I'm not entirely sure what you expect for the 
semantics of maxShares in general. Maybe a worked example would help: if I have 
a pool X with 5 running tasks from Taskset A and a maxShares of 7. Pool X is a 
child of pool Y which has a maxShares of 8. I want to the schedule another task 
from Taskset A, so should the scheduler allow it or not? Do you need to know 
how many executors are currently running (and so the maximum number of tasks 
that could be run)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-10-13 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
@kayousterhout minShares is a configuration parameter for the fair 
scheduler algorithm only - what would the semantics of a maxShares setting for 
the FIFO algorithm be?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-07-15 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
ping?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-07-04 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
Hi @kayousterhout - I've renamed all references to `maxRunningTasks` and 
updated the Markdown documentation in the repo. Is this OK? Thanks -


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-06-13 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
@squito is this OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-06-06 Thread njwhite
Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
@squito thanks - I've expanded the `Scheduler respects maxRunningTasks 
setting of its pool` test to cover the cases you mention (and a couple of 
others).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-28 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12951#issuecomment-97641
  
Thanks @markhamstra! The Jenkins build failed because a single test, 
`ExternalAppendOnlyMapSuite#spilling with compression`, failed. It seems 
unrelated (and passes locally for me) - are there known issues with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-27 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12951#issuecomment-222175368
  
Thanks @squito; I've renamed the setting to `maxRunningTasks` and added the 
tests you asked for to `TaskSetManagerSuite`. I've also added support (& tests) 
for configuring the parent pool in the XML config file, as that came in useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-20 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12951#issuecomment-220555299
  
Thanks for the review @squito - I've commented on the JIRA about why this 
feature would be useful. As for the implementation - maybe "maxShare" is the 
wrong word, as the change doesn't relate to the fair scheduler at all. Instead 
it limits the maximum number of tasks a `Schedulable` can have running at any 
one time. It really is just a one line change - `resourceOffer` now won't 
accept any more resources (i.e. won't run any of its tasks) if the calculated 
current value of `maxShares` means there isn't any free space in the pool. The 
`maxShares` method just returns the maximum number of tasks allowed in the 
pool, minus the number of tasks running in the pool. You can see the 
propagation of the `maxShares` limit in the assertions I added to the 
`PoolSuite` test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12951#issuecomment-217498019
  
@HyukjinKwon I've run `run-tests` and fixed all the style issues. Could you 
take another look? Thanks -


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread njwhite
GitHub user njwhite opened a pull request:

https://github.com/apache/spark/pull/12951

[SPARK-15176][Core] Add maxShares setting to Pools

## What changes were proposed in this pull request?

Help guarantee resource availablity by (hierarchically) limiting the amount 
of tasks a given pool can run. The maximum number of tasks for a given pool can 
be configured by the allocation XML file, and child pools are limited to at 
most the number of tasks of their parent.

## How was this patch tested?

Unit tests run and new unit tests added for functionality.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/njwhite/spark feature/pool

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12951


commit 1b046be04d18e19be33078e2b9f5e26ac8e6aa67
Author: Nick White <nwh...@palantir.com>
Date:   2016-05-06T09:42:18Z

[SPARK-15176][Core] Add maxShares setting to Pools

Help guarantee resource availablity by (hierarchically) limiting the amount 
of
tasks a given pool can run.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-25 Thread njwhite
Github user njwhite commented on the pull request:

https://github.com/apache/spark/pull/12620#issuecomment-214210492
  
@davies I'm using this to use the "dill" serializer, as it can pickle more 
things (and allows more fine-grained control) than the cloud-pickle serializer. 
What about making that the default for functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14859][PYSPARK] Make Lambda Serializer ...

2016-04-22 Thread njwhite
GitHub user njwhite opened a pull request:

https://github.com/apache/spark/pull/12620

[SPARK-14859][PYSPARK] Make Lambda Serializer Configurable

## What changes were proposed in this pull request?

Store the serializer that we should use to serialize RDD transformation
functions on the SparkContext, defaulting to a CloudPickleSerializer if not
given. Allow a user to change this serializer when first constructing the
SparkContext.

## How was this patch tested?

Unit tests and manual integration tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/njwhite/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12620


commit 0a3a7c168c6b262671a14c02c16aec3207ce9ee0
Author: Nick White <nwh...@palantir.com>
Date:   2016-04-22T20:53:20Z

[SPARK-14859][PYSPARK] Make Lambda Serializer Configurable

Store the serializer that we should use to serialize RDD transformation
functions on the SparkContext, defaulting to a CloudPickleSerializer if not
given. Allow a user to change this serializer when first constructing the
SparkContext.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org