[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821433#comment-17821433
 ] 

Mich Talebzadeh commented on SPARK-24815:
-----------------------------------------

some thoughts on this if I may

 

This enhancement request provides a solid foundation for improving dynamic 
allocation in Structured Streaming. Adding more specific details, outlining 
potential benefits, and addressing potential challenges can further strengthen 
the proposal and increase its chances of being implemented.

So these are my thoughts:



- Pluggable Dynamic Allocation: This suggestion shows good design principles, 
allowing for flexibility and future improvements. We should elaborate benefits 
of a pluggable approach, like customization and integration with external 
resource management tools.

- Separate Algorithm for Structured Streaming: This is crucial for adapting 
allocation strategies to the unique nature of streaming workloads Also  
outlining how a separate algorithm might differ from the batch counterpart 
could be useful

- Warning for Enabled Core Dynamic Allocation: This is a valuable warning to 
prevent accidental misuse and raise awareness among users. Also consider 
suggesting the warning level (e.g. info, warning, error) and potential content 
to provide clarity.

- Briefly mention potential challenges or trade-offs associated with 
implementing these proposals. Suggesting relevant discussions, resources, or 
alternative approaches could strengthen the request for enhancement

 

> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>              Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to