szehon-ho commented on code in PR #55887:
URL: https://github.com/apache/spark/pull/55887#discussion_r3262232420


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala:
##########
@@ -187,6 +189,101 @@ object PushDownUtils extends Logging {
     }
   }
 
+  /**
+   * Pushes runtime filters into `scan` and re-plans its input partitions. For 
scans whose
+   * `outputPartitioning` is a [[KeyedPartitioning]] (SPJ-active), validates 
that the data source
+   * preserved the original partitioning and pads with `None` to preserve key 
alignment with the
+   * pre-filter partition set.
+   *
+   * Must be called at execute time: runtime filters carry 
[[DynamicPruningExpression]] and
+   * scalar-subquery references whose values are only resolved after their 
broadcast/subquery
+   * side completes. The mutating [[pushRuntimeFilters]] call must run at most 
once per scan

Review Comment:
   its a bit weird to have this comment here (its mentioning something for 
pushRuntimeFilters which is a separate method).  and i think the suggestion to 
cache the result is a bit too specific (should be just, avoid calling it twice 
on the same scan).  
   
   as its merged, i made a small pr to fix the doc : 
https://github.com/apache/spark/pull/55958



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to