The reason the existing dynamic allocation does not work out of the box for spark streaming is because the heuristics used for decided when to scale up/down is not the right one for micro-batch workloads. It works great for typical batch workloads. However you can use the underlying developer API to add / remove executors to implement your own scaling logic.
1. Use SparkContext.requestExecutor and SparkContext.killExecutor 2. Use StreamingListener to get the scheduling delay and processing times, and use that do a request or kill executors. TD On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> wrote: > Dean, > > Thanks for the reply. I'm searching (via spark mailing list archive and > google) and can't find the previous thread you mentioned. I've stumbled > upon a few but may not be the thread you're referring to. I'm very > interested in reading that discussion and any links/keywords would be > greatly appreciated. > > I can see it's a non-trivial problem to solve for every use case in > streaming and thus not yet supported in general. However, I think (maybe > naively) it can be solved for specific use cases. If I use the available > features to create a fault tolerant design - i.e. failures/dead nodes can > occur on master nodes, driver node, or executor nodes without data loss and > "at-least-once" semantics is acceptable - then can't I safely scale down in > streaming by killing executors? If this is not true, then are we saying > that streaming is not fault tolerant? > > I know it won't be as simple as setting a config like > spark.dyanmicAllocation.enabled=true and magically we'll have elastic > streaming, but I'm interested if anyone else has attempted to solve this > for their specific use case with extra coding involved? Pitfalls? Thoughts? > > thanks, > Duc > > > > > On Wed, Nov 11, 2015 at 8:36 AM, Dean Wampler <deanwamp...@gmail.com> > wrote: > >> Dynamic allocation doesn't work yet with Spark Streaming in any cluster >> scenario. There was a previous thread on this topic which discusses the >> issues that need to be resolved. >> >> Dean Wampler, Ph.D. >> Author: Programming Scala, 2nd Edition >> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) >> Typesafe <http://typesafe.com> >> @deanwampler <http://twitter.com/deanwampler> >> http://polyglotprogramming.com >> >> On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> >> wrote: >> >>> I'm trying to get Spark Streaming to scale up/down its number of >>> executors within Mesos based on workload. It's not scaling down. I'm using >>> Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach. >>> >>> Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 >>> with the right configuration, I have a simple example working with the >>> spark-shell connected to a Mesos cluster. By working I mean the number of >>> executors scales up/down based on workload. However, the spark-shell is not >>> a streaming example. >>> >>> What is that status of dynamic resource allocation with Spark Streaming >>> on Mesos? Is it supported at all? Or supported but with some caveats to >>> ensure no data loss? >>> >>> thanks, >>> Duc >>> >> >> >