pkotikalapudi commented on code in PR #42352: URL: https://github.com/apache/spark/pull/42352#discussion_r1577025511
########## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ########## @@ -96,6 +96,30 @@ import org.apache.spark.util.{Clock, SystemClock, ThreadUtils, Utils} * If an executor with caching data blocks has been idle for more than this duration, * the executor will be removed * + * Dynamic resource allocation is also extended to work for structured streaming use case. + * (micro-batch paradigm). + * For it to work we would still need the above configs + few additional configs. + * + * For executor allocation, In traditional DRA target number of executors are added based on the Review Comment: From my understanding in batch queries each stage can have varied resource requirements depending upon what it does. So DRA has `schedulerBacklogTimeout` to figure out when it should ask for more resources ([more on it](schedulerBacklogTimeout)). So the [pendingTasks](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289) are determined by [the pending tasks of current stage](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L883). I have [modified it](https://github.com/apache/spark/pull/42352/files#diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1003) to consider the pending tasks of other stages as well because structured streaming deals with micro-batches and we want to scale out if the there are still other stages pending in the same micro-batch. for eg: with current DRA code, if config `spark.dynamicAllocation.schedulerBacklogTimeout` is set to 6 seconds and we use that for structured streaming job where a micro-batch consists of 4 stages which will run at max for 5 seconds each. Then it wouldn't scale out even if 20 seconds pass because it is just 5+5+5+5 = 30seconds. But the above mentioned changes I have done, while running the second stage on the 6th second it figures out that other stages in the micro-batch are pending so it scale-out appropriately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org