Awesome, thanks for the tip!
On Wed, Nov 11, 2015 at 2:25 PM, Tathagata Das <t...@databricks.com> wrote: > The reason the existing dynamic allocation does not work out of the box > for spark streaming is because the heuristics used for decided when to > scale up/down is not the right one for micro-batch workloads. It works > great for typical batch workloads. However you can use the underlying > developer API to add / remove executors to implement your own scaling > logic. > > 1. Use SparkContext.requestExecutor and SparkContext.killExecutor > > 2. Use StreamingListener to get the scheduling delay and processing times, > and use that do a request or kill executors. > > TD > > On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> > wrote: > >> Dean, >> >> Thanks for the reply. I'm searching (via spark mailing list archive and >> google) and can't find the previous thread you mentioned. I've stumbled >> upon a few but may not be the thread you're referring to. I'm very >> interested in reading that discussion and any links/keywords would be >> greatly appreciated. >> >> I can see it's a non-trivial problem to solve for every use case in >> streaming and thus not yet supported in general. However, I think (maybe >> naively) it can be solved for specific use cases. If I use the available >> features to create a fault tolerant design - i.e. failures/dead nodes can >> occur on master nodes, driver node, or executor nodes without data loss and >> "at-least-once" semantics is acceptable - then can't I safely scale down in >> streaming by killing executors? If this is not true, then are we saying >> that streaming is not fault tolerant? >> >> I know it won't be as simple as setting a config like >> spark.dyanmicAllocation.enabled=true and magically we'll have elastic >> streaming, but I'm interested if anyone else has attempted to solve this >> for their specific use case with extra coding involved? Pitfalls? Thoughts? >> >> thanks, >> Duc >> >> >> >> >> On Wed, Nov 11, 2015 at 8:36 AM, Dean Wampler <deanwamp...@gmail.com> >> wrote: >> >>> Dynamic allocation doesn't work yet with Spark Streaming in any cluster >>> scenario. There was a previous thread on this topic which discusses the >>> issues that need to be resolved. >>> >>> Dean Wampler, Ph.D. >>> Author: Programming Scala, 2nd Edition >>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) >>> Typesafe <http://typesafe.com> >>> @deanwampler <http://twitter.com/deanwampler> >>> http://polyglotprogramming.com >>> >>> On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.h...@gmail.com> >>> wrote: >>> >>>> I'm trying to get Spark Streaming to scale up/down its number of >>>> executors within Mesos based on workload. It's not scaling down. I'm using >>>> Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach. >>>> >>>> Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 >>>> with the right configuration, I have a simple example working with the >>>> spark-shell connected to a Mesos cluster. By working I mean the number of >>>> executors scales up/down based on workload. However, the spark-shell is not >>>> a streaming example. >>>> >>>> What is that status of dynamic resource allocation with Spark Streaming >>>> on Mesos? Is it supported at all? Or supported but with some caveats to >>>> ensure no data loss? >>>> >>>> thanks, >>>> Duc >>>> >>> >>> >> >