adriangb opened a new issue, #19187: URL: https://github.com/apache/datafusion/issues/19187
Adapted from https://github.com/apache/datafusion/pull/19130#issuecomment-3621848790: It seems to me that we could improve the cohesiveness of the volatility API between logical and physical expressions and make it easier to write optimizer rules, etc. that take this into account if we formally introduced the concept of "expression volatility". It's somewhat implicit as of now, and how it relates to function volatility is not clear. My goal would be to be able to formally express that an expression is "constant" in terms of volatility. For physical expressions we currently have `fn is_volatile(expr: &Arc<dyn PhysicalExpression>) -> bool`, but that doesn't tell you if it's a "stable" expression (like `column > 123`) or a "constant" expression (`456 > 123`). Looking at the logical expression const evaluator it seems like it does manual matching: https://github.com/apache/datafusion/blob/deaccb784ee1223bf9eed36562c5775aaedebc6b/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L637-L682. It looks like we have a `Volatility` targeted at functions: https://github.com/apache/datafusion/blob/c479deeb668cc444936bbfa66100aef9b2013b2e/datafusion/expr-common/src/signature.rs#L53-L85 But it does not have a "constant" option (nor would that really make sense for a function). `Constant` is not the same as immutable, as per the definition above `Immutable`: > Always returns the same output when given the same input. `Constant` would be: > Always returns the same output regardless of the input. (Generally because it has no input but I guess you could have an expression that takes an input but returns a constant / ignores the input?) Which makes me wonder if it wouldn't make sense to an enum along the lines of: ```rust pub enum ExprVolatility { Constant, Immutable, Stable, Volatile, } ``` And augment `PhysicalExpression` and `Expr` with methods to calculate the volatility: ```rust impl Expr { pub fn node_volatility(&self) -> ExprVolatility { match self { ... } } pub fn volatility(&self) -> ExprVolatility { self.apply(...) // recursive } } ``` ```rust pub trait PhysicalExpr { fn node_volatility(&self) -> ExprVolatility { ExprVolatility::Volatilve } fn volatility(&self) -> ExprVolatility { self.apply(...) // recursive } } ``` This seems like it would be required for `PhysicalExpr` to be able to participate in volatility calculations. For `Expr` it would be more of a matter of having it all in once place that can be re-used by other optimizers, user code, etc. The main alternative I see to this is to define "input" as "columns/data" (as opposed to e.g. `now()`) such that "immutable functions that do not have any columns as children are considered constant", but I'm not sure if that is correct enough. Or we can leave it implicit as it is now w/ logic for constant sub expression detection scattered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
