Sure im not talking about k8s here. The discussion is about the heuristics and their drawbacks.
Στις Δευ, 27 Μαΐ 2019, 2:04 μ.μ. ο χρήστης Gabor Somogyi < gabor.g.somo...@gmail.com> έγραψε: > K8s is a different story, please take a look at the doc "Future Work" part. > > On Fri, May 24, 2019 at 9:40 PM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > >> Btw the heuristics for batch mode ( >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289) >> vs >> streaming ( >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L91-L92) >> are different. In batch mode you care about the numRunningOrPendingTasks >> while >> for streaming about the ratio: averageBatchProcTime.toDouble / >> batchDurationMs so there are some concerns beyond scaling down when >> idle. >> A scenario things might now work for batch dynamic allocation with SS is >> as follows. I start with a query that reads x kafka partitions and the data >> arriving is low and all tasks (1 per partition) are running since there are >> enough resources anyway. >> At some point the data increases per partition (maxOffsetsPerTrigger is >> high enough) and so processing takes more time. AFAIK SS will wait for a >> batch to finish before running the next (waits for the trigger to finish, >> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L46 >> ). >> In this case I suspect there is no scaling up with the batch dynamic >> allocation mode as there are no pending tasks, only processing time >> changed. In this case the streaming dynamic heuristics I think are better. >> Batch mode heuristics could work, if not mistaken, if you have multiple >> streaming queries and there are batches waiting (using fair-scheduling etc). >> >> PS. this has been discussed, not in depth, in the past on the list ( >> https://mail-archives.apache.org/mod_mbox/spark-user/201708.mbox/%3c1503626484779-29104.p...@n3.nabble.com%3E >> ) >> >> >> >> >> On Fri, May 24, 2019 at 9:22 PM Stavros Kontopoulos < >> stavros.kontopou...@lightbend.com> wrote: >> >>> I am on k8s where there is no support yet afaik, there is wip wrt the >>> shuffle service. So from your experience there are no issues with using the >>> batch dynamic allocation version like there was before with dstreams as >>> described in the related jira? >>> >>> Στις Παρ, 24 Μαΐ 2019, 8:28 μ.μ. ο χρήστης Gabor Somogyi < >>> gabor.g.somo...@gmail.com> έγραψε: >>> >>>> It scales down with yarn. Not sure how you've tested. >>>> >>>> On Fri, 24 May 2019, 19:10 Stavros Kontopoulos, < >>>> stavros.kontopou...@lightbend.com> wrote: >>>> >>>>> Yes nothing happens. In this case it could propagate info to the >>>>> resource manager to scale down the number of executors no? Just a thought. >>>>> >>>>> Στις Παρ, 24 Μαΐ 2019, 7:17 μ.μ. ο χρήστης Gabor Somogyi < >>>>> gabor.g.somo...@gmail.com> έγραψε: >>>>> >>>>>> Structured Streaming works differently. If no data arrives no tasks >>>>>> are executed (just had a case in this area). >>>>>> >>>>>> BR, >>>>>> G >>>>>> >>>>>> >>>>>> On Fri, 24 May 2019, 18:14 Stavros Kontopoulos, < >>>>>> stavros.kontopou...@lightbend.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Some while ago the streaming dynamic allocation part was added in >>>>>>> DStreams(https://issues.apache.org/jira/browse/SPARK-12133) to >>>>>>> improve the issues with the batch based one. Should this be ported >>>>>>> to structured streaming? Thoughts? >>>>>>> AFAIK there is no support in SS for it. >>>>>>> >>>>>>> Best, >>>>>>> Stavros >>>>>>> >>>>>>> >> >> -- >> Stavros Kontopoulos >> *Principal Engineer* >> *Lightbend Platform <https://www.lightbend.com/lightbend-platform>* >> *mob: **+30 6977967274 <+30+6977967274>* >> >>