matthewgapp commented on code in PR #7581:
URL: https://github.com/apache/arrow-datafusion/pull/7581#discussion_r1446401353
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -112,6 +112,8 @@ pub enum LogicalPlan {
/// produces 0 or 1 row. This is used to implement SQL `SELECT`
/// that has no values in the `FROM` clause.
EmptyRelation(EmptyRelation),
+ /// A named temporary relation with a schema.
+ NamedRelation(NamedRelation),
Review Comment:
@jonahgao, could you provide the rationale for your suggested strategy? I'm
interested in understanding why it might be more effective than the current
implementation. Performance is critical to our use case. And the implementation
for recursion is very sensitive to performance considerations, as the setup for
execution and stream management isn't amortized over all input record batches.
Instead, it's incurred with each iteration. For instance, we've observed a
substantial performance boost—up to 30 times faster—by eliminating certain
intermediate nodes, like coalesce, from our plan (as evidenced in [this
PR](https://github.com/matthewgapp/arrow-datafusion/pull/2)). I've drafted
another PR that appears to again double the speed of execution merely by
omitting metric collection in recursive sub-graphs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]