Olwn opened a new pull request #29578: URL: https://github.com/apache/spark/pull/29578
### What changes were proposed in this pull request? Currently dstream.getOrCompute runs at JobGenerator, which has a single thread event loop. This pull request moves that to JobScheduler. ### Why are the changes needed? Some of our spark applications have batch creation delay after running for some time. For instance, Batch 10:03 is submitted at 10:06. In spark UI, the latest batch doesn't match current time. We observe such applications share a commonality that rdd actions exist in dstream.transfrom. Those actions will be executed in dstream.compute, which is called by JobGenerator. JobGenerator runs with a single thread event loop so any synchronized operations will block event processing. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added two tests 1. ForEachDStreamSuite to make sure batch execution doesn't block batch submission 2. JobSchedulerSuite to make sure all jobs in a batch can be associated with the BatchTime and display at Spark UI ### JIRAs https://issues.apache.org/jira/browse/SPARK-32734 https://issues.apache.org/jira/browse/SPARK-32735 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org