K8s is a different story, please take a look at the doc "Future Work" part.
On Fri, May 24, 2019 at 9:40 PM Stavros Kontopoulos < stavros.kontopou...@lightbend.com> wrote: > Btw the heuristics for batch mode ( > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289) > vs > streaming ( > https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L91-L92) > are different. In batch mode you care about the numRunningOrPendingTasks while > for streaming about the ratio: averageBatchProcTime.toDouble / > batchDurationMs so there are some concerns beyond scaling down when idle. > A scenario things might now work for batch dynamic allocation with SS is > as follows. I start with a query that reads x kafka partitions and the data > arriving is low and all tasks (1 per partition) are running since there are > enough resources anyway. > At some point the data increases per partition (maxOffsetsPerTrigger is > high enough) and so processing takes more time. AFAIK SS will wait for a > batch to finish before running the next (waits for the trigger to finish, > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L46 > ). > In this case I suspect there is no scaling up with the batch dynamic > allocation mode as there are no pending tasks, only processing time > changed. In this case the streaming dynamic heuristics I think are better. > Batch mode heuristics could work, if not mistaken, if you have multiple > streaming queries and there are batches waiting (using fair-scheduling etc). > > PS. this has been discussed, not in depth, in the past on the list ( > https://mail-archives.apache.org/mod_mbox/spark-user/201708.mbox/%3c1503626484779-29104.p...@n3.nabble.com%3E > ) > > > > > On Fri, May 24, 2019 at 9:22 PM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > >> I am on k8s where there is no support yet afaik, there is wip wrt the >> shuffle service. So from your experience there are no issues with using the >> batch dynamic allocation version like there was before with dstreams as >> described in the related jira? >> >> Στις Παρ, 24 Μαΐ 2019, 8:28 μ.μ. ο χρήστης Gabor Somogyi < >> gabor.g.somo...@gmail.com> έγραψε: >> >>> It scales down with yarn. Not sure how you've tested. >>> >>> On Fri, 24 May 2019, 19:10 Stavros Kontopoulos, < >>> stavros.kontopou...@lightbend.com> wrote: >>> >>>> Yes nothing happens. In this case it could propagate info to the >>>> resource manager to scale down the number of executors no? Just a thought. >>>> >>>> Στις Παρ, 24 Μαΐ 2019, 7:17 μ.μ. ο χρήστης Gabor Somogyi < >>>> gabor.g.somo...@gmail.com> έγραψε: >>>> >>>>> Structured Streaming works differently. If no data arrives no tasks >>>>> are executed (just had a case in this area). >>>>> >>>>> BR, >>>>> G >>>>> >>>>> >>>>> On Fri, 24 May 2019, 18:14 Stavros Kontopoulos, < >>>>> stavros.kontopou...@lightbend.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Some while ago the streaming dynamic allocation part was added in >>>>>> DStreams(https://issues.apache.org/jira/browse/SPARK-12133) to >>>>>> improve the issues with the batch based one. Should this be ported >>>>>> to structured streaming? Thoughts? >>>>>> AFAIK there is no support in SS for it. >>>>>> >>>>>> Best, >>>>>> Stavros >>>>>> >>>>>> > > -- > Stavros Kontopoulos > *Principal Engineer* > *Lightbend Platform <https://www.lightbend.com/lightbend-platform>* > *mob: **+30 6977967274 <+30+6977967274>* > >