alamb commented on code in PR #15568:
URL: https://github.com/apache/datafusion/pull/15568#discussion_r2035073646
##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -283,6 +284,51 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug +
DynEq + DynHash {
/// See the [`fmt_sql`] function for an example of printing
`PhysicalExpr`s as SQL.
///
fn fmt_sql(&self, f: &mut Formatter<'_>) -> fmt::Result;
+
+ /// Take a snapshot of this `PhysicalExpr` if it is dynamic.
+ /// This is used to capture the current state of `PhysicalExpr`s that may
contain
+ /// dynamic references to other operators in order to serialize it over
the wire
+ /// or treat it via downcast matching.
+ ///
+ /// You should not call this method directly as it does not handle
recursion.
+ /// Instead use [`snapshot_physical_expr`] to handle recursion and capture
the
+ /// full state of the `PhysicalExpr`.
+ ///
+ /// This is expected to return "simple" expressions that do not have
mutable state
+ /// and are composed of DataFusion's built-in `PhysicalExpr`
implementations.
+ /// Callers however should *not* assume anything about the returned
expressions
+ /// since callers and implementers may not agree on what "simple" or
"built-in"
+ /// means.
+ /// In other words, if you need to searlize a `PhysicalExpr` across the
wire
+ /// you should call this method and then try to serialize the result,
+ /// but you should handle unknown or unexpected `PhysicalExpr`
implementations gracefully
+ /// just as if you had not called this method at all.
+ ///
+ /// In particular, consider:
+ /// * A `PhysicalExpr` that references the current state of a
`datafusion::physical_plan::TopK`
+ /// that is involved in a query with `SELECT * FROM t1 ORDER BY a LIMIT
10`.
+ /// This function may return something like `a >= 12`.
+ /// * A `PhysicalExpr` that references the current state of a
`datafusion::physical_plan::joins::HashJoinExec`
+ /// from a query such as `SELECT * FROM t1 JOIN t2 ON t1.a = t2.b`.
+ /// This function may return something like `t2.b IN (1, 5, 7)`.
+ ///
+ /// A system or function that can only deal with a hardcoded set of
`PhysicalExpr` implementations
+ /// or needs to serialize this state to bytes may not be able to handle
these dynamic references.
+ /// In such cases, we should return a simplified version of the
`PhysicalExpr` that does not
+ /// contain these dynamic references.
+ ///
+ /// Systems that implement remote execution of plans, e.g. serialize a
portion of the query plan
+ /// and send it across the wire to a remote executor may want to call this
method after
+ /// every batch on the source side and brodcast / update the current
snaphot to the remote executor.
+ ///
+ /// Note for implementers: this method should *not* handle recursion.
+ /// Recursion is handled in [`snapshot_physical_expr`].
+ fn snapshot(&self) -> Result<Option<Arc<dyn PhysicalExpr>>> {
+ // By default, we return None to indicate that this PhysicalExpr does
not
+ // have any dynamic references or state.
+ // This is a safe default behavior.
+ Ok(None)
+ }
Review Comment:
in theory the schema of the input to a PhysicaExpr shouldn't change so any
expr that needs it could hold a reference 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]