alamb opened a new issue #98:
URL: https://github.com/apache/arrow-datafusion/issues/98


   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-9770
   
   The high level idea is that if an expression can be partially evaluated 
during planning time then
   # The execution time will be increased
   # There may be additional optimizations possible (like removing entire 
LogicalPlan nodes, for example)
   
   I recently saw the following selection expression created (by the [predicate 
push down|https://github.com/apache/arrow/pull/7880])
   
   {code}
   Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a 
Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
                 TableScan: test projection=None
   {code}
   
   This could be simplified significantly:
   1. Duplicate clauses could be removed (e.g. `#a Eq Int64(1) And #a Eq 
Int64(1)` --> `#a Eq Int64(1)`)
   2. Algebraic simplification (e.g. if `A<=B and A=5`, is the same as `A=5`)
   
   Inspiration can be taken from the postgres code that evaluates constant 
expressions 
https://doxygen.postgresql.org/clauses_8c.html#ac91c4055a7eb3aa6f1bc104479464b28
   
   (in this case, for example if you have a predicate A=5 then you can 
basically substitute in A=5 for any expression higher up in the the plan).
   
   Other classic optimizations include things such as `A OR TRUE` --> `A`, `A 
AND TRUE` --> TRUE,  etc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to