[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palaniappan updated SPARK-24815:
----------------------------------------
    Description: 
For batch jobs, dynamic allocation is very useful for adding and removing 
containers to match the actual workload. On multi-tenant clusters, it ensures 
that a Spark job is taking no more resources than necessary. In cloud 
environments, it enables autoscaling.

However, if you set spark.dynamicAllocation.enabled=true and run a structured 
streaming job, the batch dynamic allocation algorithm kicks in. It requests 
more executors if the task backlog is a certain size, and removes executors if 
they idle for a certain period of time.

Quick thoughts:

1) Dynamic allocation should be pluggable, rather than hardcoded to a 
particular implementation in SparkContext.scala (this should be a separate 
JIRA).

2) We should make a structured streaming algorithm that's separate from the 
batch algorithm. Eventually, continuous processing might need its own algorithm.

3) Spark should print a warning if you run a structured streaming job when 
Core's dynamic allocation is enabled

  was:
Dynamic allocation is very useful for adding and removing containers to match 
the actual workload. On multi-tenant clusters, it ensures that a Spark job is 
taking no more resources than necessary. In cloud environments, it enables 
autoscaling.

However, if you set spark.dynamicAllocation.enabled=true and run a structured 
streaming job, Core's dynamic allocation algorithm kicks in. It requests 
executors if the task backlog is a certain size, and remove executors if they 
idle for a certain period of time.

This does not make sense for streaming jobs, as outlined in 
https://issues.apache.org/jira/browse/SPARK-12133, which introduced dynamic 
allocation for the old streaming API.

First, Spark should print a warning if you run a structured streaming job when 
Core's dynamic allocation is enabled

Second, structured streaming should have support for dynamic allocation. It 
would be convenient if it were the same set of properties as Core's dynamic 
allocation, but I don't have a strong opinion on that.

If somebody can give me pointers on how to add dynamic allocation support, I'd 
be happy to take a stab.


> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to