[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553867#comment-17553867 ]
Ramiz Mehran edited comment on SPARK-24815 at 6/14/22 3:05 AM: --------------------------------------------------------------- Guys, is this thread still alive? I think dynamic scaling for SSS should be taken from spark-streaming itself. The logic of "processing/batch duration ratio" makes sense and removes any other dependency from the calculation. Also, there should be a moving average to calculate and this moving average batch count can be configurable. was (Author: JIRAUSER290918): Guys, is this thread still alive? I think SSS for structure-streaming should be taken from spark-streaming itself. The logic of "processing/batch duration ratio" makes sense and removes any other dependency from the calculation. Also, there should be a moving average to calculate and this moving average batch count can be configurable. > Structured Streaming should support dynamic allocation > ------------------------------------------------------ > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming > Affects Versions: 2.3.1 > Reporter: Karthik Palaniappan > Priority: Minor > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org