pepijnve commented on PR #18055: URL: https://github.com/apache/datafusion/pull/18055#issuecomment-3406549307
> There is similar code for filtering here (namely that evaluates the filter expression first, and then only calles `filter` with columns that are needed) This touches on one of the things I was struggling with a bit working in the `PhysicalExpr` context rather than `ExecutionPlan`. While each `ExecutionPlan` is aware of its own input and output schema, `PhysicalExpr` does not. Instead the `Schema` is passed in as argument to functions like `nullable`. And for `evaluate` specifically, you get it via the `RecordBatch`. The consequence is that I have to `RecordBatch::project` which ends up deriving the same schema on every invocation. I wasn't sure how we could fix this. I already need the schema anyway in order to decide if it makes sense to project or not. One simple solution is to just keep a reference to that one. But things get a bit weird when a `PhysicalExpr` has a reference to a schema but also receives one externally when `nullable` and friends are called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
