(whoops, redundant sentence in that first paragraph) On Fri, Sep 18, 2015 at 8:36 AM, Philip Weaver <philip.wea...@gmail.com> wrote:
> Here's a specific example of what I want to do. My Spark application is > running with total-executor-cores=8. A request comes in, it spawns a thread > to handle that request, and starts a job. That job should use only 4 cores, > not all 8 of the cores available to the cluster.. When the first job is > scheduled, it should take only 4 cores, not all 8 of the cores that are > available to the driver. > > Is there any way to accomplish this? This is on mesos. > > In order to support the use cases described in > https://spark.apache.org/docs/latest/job-scheduling.html, where a spark > application runs for a long time and handles requests from multiple users, > I believe what I'm asking about is a very important feature. One of the > goals is to get lower latency for each request, but if the first request > takes all resources and we can't guarantee any free resources for the > second request, then that defeats the purpose. Does that make sense? > > Thanks in advance for any advice you can provide! > > - Philip > > On Sat, Sep 12, 2015 at 10:40 PM, Philip Weaver <philip.wea...@gmail.com> > wrote: > >> I'm playing around with dynamic allocation in spark-1.5.0, with the FAIR >> scheduler, so I can define a long-running application capable of executing >> multiple simultaneous spark jobs. >> >> The kind of jobs that I'm running do not benefit from more than 4 cores, >> but I want my application to be able to take several times that in order to >> run multiple jobs at the same time. >> >> I suppose my question is more basic: How can I limit the number of cores >> used to load an RDD or DataFrame? I can immediately repartition or coalesce >> my RDD or DataFrame to 4 partitions after I load it, but that doesn't stop >> Spark from using more cores to load it. >> >> Does it make sense what I am trying to accomplish, and is there any way >> to do it? >> >> - Philip >> >> >