Olwn opened a new pull request #29578:
URL: https://github.com/apache/spark/pull/29578


   ### What changes were proposed in this pull request?
   Currently dstream.getOrCompute runs at JobGenerator, which has a single 
thread event loop.
   This pull request moves that to JobScheduler.
   
   
   ### Why are the changes needed?
   Some of our spark applications have batch creation delay after running for 
some time. For instance, Batch 10:03 is submitted at 10:06. In spark UI, the 
latest batch doesn't match current time.
   We observe such applications share a commonality that rdd actions exist in 
dstream.transfrom. Those actions will be executed in dstream.compute, which is 
called by JobGenerator. JobGenerator runs with a single thread event loop so 
any synchronized operations will block event processing.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added two tests
   
   1. ForEachDStreamSuite to make sure batch execution doesn't block batch 
submission
   2. JobSchedulerSuite to make sure all jobs in a batch can be associated with 
the BatchTime and display at Spark UI
   
   ### JIRAs
   https://issues.apache.org/jira/browse/SPARK-32734
   https://issues.apache.org/jira/browse/SPARK-32735
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to