Jefffrey commented on issue #8379:
URL: 
https://github.com/apache/arrow-datafusion/issues/8379#issuecomment-1836817802

   Actually I think I was off the mark on what `ExprId` is intended to do, it 
seems it would be more useful if there were a new LogicalExpr enum such as 
`AttributeReference`, which would refer to an expr from the parent plan by 
ExprId
   
   Like given a logical plan:
   
   ```
   Projection: a.int_col, b.double_col, CAST(a.date_string_col AS Utf8)
     Inner Join: a.int_col = b.int_col
       SubqueryAlias: a
         Projection: alltypes_plain.int_col, alltypes_plain.date_string_col
           Filter: alltypes_plain.id > Int32(1)
             TableScan: alltypes_plain projection=[id, int_col, 
date_string_col], partial_filters=[alltypes_plain.id > Int32(1)]
       SubqueryAlias: b
         Projection: alltypes_plain.int_col, alltypes_plain.double_col
           Filter: CAST(alltypes_plain.tinyint_col AS Float64) < 
alltypes_plain.double_col
             TableScan: alltypes_plain projection=[tinyint_col, int_col, 
double_col], partial_filters=[CAST(alltypes_plain.tinyint_col AS Float64) < 
alltypes_plain.double_col]
   ```
   
   That top level projection has `a.int_col` as a `Column` for example, which 
when turned into physical plan needs to search the parent schema by name
   
   
https://github.com/apache/arrow-datafusion/blob/a6e6d3fab083839239ef81cf3a3546dd8929a541/datafusion/core/src/physical_planner.rs#L879-L891
   
   Whereas with exprid's, it could be possible for `a.int_col` to be an 
AttributeReference which references the parent expr list to point to which expr 
it references by id.
   
   And I think each new expr would have a new ID.
   
   Honestly I could be way off the mark here on the usages/benefits of exprid 😅
   
   It's just something I was thinking about, especially in relation to how 
verbose it can be to check if columns are the same when taking into account 
table, schema and catalog parts of the identifier for a column
   
   - See troubles with ambiguity check here 
https://github.com/apache/arrow-datafusion/issues/6012
   
   So instead of having to find the original column of a projected column in a 
logical plan via name during logical optimization/physical planning, could have 
that done once off in an analyzer rule pass then afterwards use exprids


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to