alamb commented on code in PR #9690:
URL: https://github.com/apache/arrow-datafusion/pull/9690#discussion_r1530720578
##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -162,14 +163,37 @@ fn optimize_projections(
.map(|input|
((0..input.schema().fields().len()).collect_vec(), false))
.collect::<Vec<_>>()
}
+ LogicalPlan::Extension(extension) => {
+ let children = extension.node.inputs();
+ if children.len() != 1 {
+ // TODO: Add support for `LogicalPlan::Extension` with multi
children.
+ return Ok(None);
+ }
+ // has single child
+ let exprs = plan.expressions();
+ let child = children[0];
+ let node_schema = extension.node.schema();
+ let child_schema = child.schema();
+ if let Some(parent_required_indices_mapped) =
Review Comment:
This doesn't seem right to me -- I don't think we can assume that a column
named "foo" on the input is the same as a column named "foo" in the output
schema (this isn't true for `LogicalPlan::Projection` for example).
##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -1192,4 +1291,24 @@ mod tests {
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}
+
+ // Optimize Projections Rule, pushes down projection through users defined
logical plan node.
Review Comment:
Can we please add a few more tests:
1. The NoOpUserDefined plan itself refers to column `a` in its expressions
1. The NoOpUserDefined plan itself refers to column `b` in its expressions
1. The NoOpUserDefined plan itself refers to column `a + b` in its
expressions
##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -162,14 +163,37 @@ fn optimize_projections(
.map(|input|
((0..input.schema().fields().len()).collect_vec(), false))
.collect::<Vec<_>>()
}
+ LogicalPlan::Extension(extension) => {
+ let children = extension.node.inputs();
+ if children.len() != 1 {
+ // TODO: Add support for `LogicalPlan::Extension` with multi
children.
+ return Ok(None);
+ }
+ // has single child
+ let exprs = plan.expressions();
+ let child = children[0];
+ let node_schema = extension.node.schema();
+ let child_schema = child.schema();
+ if let Some(parent_required_indices_mapped) =
Review Comment:
I think the information that is needed is "given the parent only needs some
of the output columns of this UserDefinedPlan, can it avoid requiring
additional input columns"
Given the UserDefinedPlan is basically a black box and the optimizer doesn't
know how it works, I think the only way to get this information would be to add
some sort of API to UserDefined node
I suggest for this PR, only push down information about used expressions
(don't try and incorporate a projection from the parent) and we can handle the
parent projection information in a follow on PR
I am thinking of an API like
```rust
trait UserDefinedLogicalNode {
...
/// If only the specified indices of the output of node are needed
/// by the consumer, optionally returns a new UserDefinedNode that
/// computes only those indices, in the specified order.
///
/// Return Ok(None), the default, if no such pushdown is possible
///
/// This is used to implement projection pushdown for user defined nodes
fn try_project(&self, indices: &[usize]) -> Result<Option<Arc<dyn Self>>> {
Ok(None)
}
```
##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -897,23 +921,98 @@ fn is_projection_unnecessary(input: &LogicalPlan,
proj_exprs: &[Expr]) -> Result
&& proj_exprs.iter().all(is_expr_trivial))
}
+/// Calculates the correspondinf target indices of the expressions at the
`source_indices` of the `source_fields`.
Review Comment:
```suggestion
/// Calculates the corresponding target indices of the expressions at the
`source_indices` of the `source_fields`.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]