alamb commented on code in PR #9690:
URL: https://github.com/apache/arrow-datafusion/pull/9690#discussion_r1530720578


##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -162,14 +163,37 @@ fn optimize_projections(
                 .map(|input| 
((0..input.schema().fields().len()).collect_vec(), false))
                 .collect::<Vec<_>>()
         }
+        LogicalPlan::Extension(extension) => {
+            let children = extension.node.inputs();
+            if children.len() != 1 {
+                // TODO: Add support for `LogicalPlan::Extension` with multi 
children.
+                return Ok(None);
+            }
+            // has single child
+            let exprs = plan.expressions();
+            let child = children[0];
+            let node_schema = extension.node.schema();
+            let child_schema = child.schema();
+            if let Some(parent_required_indices_mapped) =

Review Comment:
   This doesn't seem right to me -- I don't think we can assume that a column 
named "foo" on the input is the same as a column named "foo" in the output 
schema (this isn't true for `LogicalPlan::Projection` for example).



##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -1192,4 +1291,24 @@ mod tests {
         \n    TableScan: test projection=[a]";
         assert_optimized_plan_equal(&plan, expected)
     }
+
+    // Optimize Projections Rule, pushes down projection through users defined 
logical plan node.

Review Comment:
   Can we please add a few more tests:
   
   1. The NoOpUserDefined plan itself refers to column `a` in its expressions
   1. The NoOpUserDefined plan itself refers to column `b` in its expressions
   1. The NoOpUserDefined plan itself refers to column `a + b` in its 
expressions
   



##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -162,14 +163,37 @@ fn optimize_projections(
                 .map(|input| 
((0..input.schema().fields().len()).collect_vec(), false))
                 .collect::<Vec<_>>()
         }
+        LogicalPlan::Extension(extension) => {
+            let children = extension.node.inputs();
+            if children.len() != 1 {
+                // TODO: Add support for `LogicalPlan::Extension` with multi 
children.
+                return Ok(None);
+            }
+            // has single child
+            let exprs = plan.expressions();
+            let child = children[0];
+            let node_schema = extension.node.schema();
+            let child_schema = child.schema();
+            if let Some(parent_required_indices_mapped) =

Review Comment:
   I think the information that is needed is "given the parent only needs some 
of the output columns of this UserDefinedPlan, can it avoid requiring 
additional input columns"
   
   Given the UserDefinedPlan is basically a black box and the optimizer doesn't 
know how it works, I think the only way to get this information would be to add 
some sort of API to UserDefined node
   
   I suggest for this PR, only push down information about used expressions 
(don't try and incorporate a projection from the parent) and we can handle the 
parent projection information in a follow on PR
   
   
   I am thinking of an API like
   ```rust
   trait UserDefinedLogicalNode {
   ...
   /// If only the specified indices of the output of node are needed
   /// by the consumer, optionally returns a new UserDefinedNode that 
   /// computes only those indices, in the specified order.
   ///
   /// Return Ok(None), the default, if no such pushdown is possible
   ///
   /// This is used to implement projection pushdown for user defined nodes
   fn try_project(&self, indices: &[usize]) -> Result<Option<Arc<dyn Self>>> {
     Ok(None)
   }
   ```



##########
datafusion/optimizer/src/optimize_projections.rs:
##########
@@ -897,23 +921,98 @@ fn is_projection_unnecessary(input: &LogicalPlan, 
proj_exprs: &[Expr]) -> Result
         && proj_exprs.iter().all(is_expr_trivial))
 }
 
+/// Calculates the correspondinf target indices of the expressions at the 
`source_indices` of the `source_fields`.

Review Comment:
   ```suggestion
   /// Calculates the corresponding target indices of the expressions at the 
`source_indices` of the `source_fields`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to