HeartSaVioR opened a new pull request, #37161:
URL: https://github.com/apache/spark/pull/37161

   ### What changes were proposed in this pull request?
   
   This PR proposes to include the origin logical plan for LogicalRDD, if the 
LogicalRDD is built from DataFrame's RDD. Once the origin logical plan is 
available, LogicalRDD produces the stats from origin logical plan rather than 
default one.
   
   Also, this PR applies the change to ForeachBatchSink, which seems to be the 
only case as of now in current codebase.
   
   ### Why are the changes needed?
   
   The origin logical plan can be useful for several use cases, including:
   
   1. wants to connect the two split logical plans into one (consider the case 
of foreachBatch sink: origin logical plan represents the plan for streaming 
query, and the logical plan for new Dataset represents the plan for batch query 
in user function)
   2. inherits plan stats from origin logical plan
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to