Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

via GitHub Tue, 23 Apr 2024 16:42:40 -0700


pkotikalapudi commented on code in PR #42352:
URL: https://github.com/apache/spark/pull/42352#discussion_r1577025511



##########
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:
##########
@@ -96,6 +96,30 @@ import org.apache.spark.util.{Clock, SystemClock, 
ThreadUtils, Utils}
  *     If an executor with caching data blocks has been idle for more than 
this duration,
  *     the executor will be removed
  *
+ * Dynamic resource allocation is also extended to work for structured 
streaming use case.
+ * (micro-batch paradigm).
+ * For it to work we would still need the above configs + few additional 
configs.
+ *
+ * For executor allocation, In traditional DRA target number of executors are 
added based on the

Review Comment:
   From my understanding in batch queries each stage can  have varied resource 
requirements depending upon what it does. So DRA has `schedulerBacklogTimeout` 
to figure out when it should ask for more resources ([more on 
it](schedulerBacklogTimeout)). So the 
[pendingTasks](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289)
 are determined by [the pending tasks of current 
stage](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L883).
 I have [modified 
it](https://github.com/apache/spark/pull/42352/files#diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1003)
 to consider the pending tasks of other stages as well because structured 
streaming deals with micro-batches and we want to scale out if the there are 
still other stages pending in the same micro-batch.
   
   for eg:
   
   with current DRA code, if config 
`spark.dynamicAllocation.schedulerBacklogTimeout` is set to 6 seconds and we 
use that for structured streaming job where a micro-batch consists of 4 stages 
which will run at max for 5 seconds each. 
   
   Then it wouldn't scale out even if 20 seconds pass because it is just 
5+5+5+5 = 30seconds.
   
   But the above mentioned changes I have done, while running the second stage 
on the 6th second it figures out that other stages in the micro-batch are 
pending so it scale-out appropriately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

Reply via email to