[jira] [Commented] (SPARK-15176) Job Scheduling Within Application Suffers from Priority Inversion

2016-05-20 Thread Nick White (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293009#comment-15293009
 ] 

Nick White commented on SPARK-15176:


[~kayousterhout] [~irashid] We use Spark to serve interactive queries submitted 
by end-users. The data the queries run on is refreshed periodically, and 
there's a high IO cost to reading it (as it lives in S3).

We're using the linked PR to support two pools; one serves user queries (and so 
always needs hardware resources available for responsiveness) and the other 
loads new data into memory as cached RDDs and performs some basic indexing. 
When the new data is fully cached it's swapped with the set of RDDs the "query" 
pool runs against - so users see no degradation of performance as their queries 
never hit uncached data.

Under the existing scheduler implementation, we've seen tasks from the caching 
& indexing pool use all up all the hardware resources, and when a user query 
arrives the query's tasks have to wait for indexing tasks to finish before they 
can start executing (at which point the fair scheduler ensures both the query 
and the indexing job make progress).

> Job Scheduling Within Application Suffers from Priority Inversion
> -
>
> Key: SPARK-15176
> URL: https://issues.apache.org/jira/browse/SPARK-15176
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1
>Reporter: Nick White
>
> Say I have two pools, and N cores in my cluster:
> * I submit a job to one, which has M >> N tasks
> * N of the M tasks are scheduled
> * I submit a job to the second pool - but none of its tasks get scheduled 
> until a task from the other pool finishes!
> This can lead to unbounded denial-of-service for the second pool - regardless 
> of `minShare` or `weight` settings. Ideally Spark would support a pre-emption 
> mechanism, or an upper bound on a pool's resource usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15176) Job Scheduling Within Application Suffers from Priority Inversion

2016-05-06 Thread Nick White (JIRA)
Nick White created SPARK-15176:
--

 Summary: Job Scheduling Within Application Suffers from Priority 
Inversion
 Key: SPARK-15176
 URL: https://issues.apache.org/jira/browse/SPARK-15176
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 1.6.1
Reporter: Nick White


Say I have two pools, and N cores in my cluster:
* I submit a job to one, which has M >> N tasks
* N of the M tasks are scheduled
* I submit a job to the second pool - but none of its tasks get scheduled until 
a task from the other pool finishes!

This can lead to unbounded denial-of-service for the second pool - regardless 
of `minShare` or `weight` settings. Ideally Spark would support a pre-emption 
mechanism, or an upper bound on a pool's resource usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14859) [PYSPARK] Make Lambda Serializer Configurable

2016-04-22 Thread Nick White (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254665#comment-15254665
 ] 

Nick White commented on SPARK-14859:


I've got a PR for this here: https://github.com/apache/spark/pull/12620

> [PYSPARK] Make Lambda Serializer Configurable
> -
>
> Key: SPARK-14859
> URL: https://issues.apache.org/jira/browse/SPARK-14859
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Nick White
>
> Currently lambdas (e.g. used in RDD.map) are serialized by a hardcoded 
> reference to the CloudPickleSerializer. The serializer should be 
> configurable, as these lambdas may contain complex objects that need custom 
> serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14859) [PYSPARK] Make Lambda Serializer Configurable

2016-04-22 Thread Nick White (JIRA)
Nick White created SPARK-14859:
--

 Summary: [PYSPARK] Make Lambda Serializer Configurable
 Key: SPARK-14859
 URL: https://issues.apache.org/jira/browse/SPARK-14859
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 2.0.0
Reporter: Nick White


Currently lambdas (e.g. used in RDD.map) are serialized by a hardcoded 
reference to the CloudPickleSerializer. The serializer should be configurable, 
as these lambdas may contain complex objects that need custom serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org