2010YOUY01 commented on issue #18319:
URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3748471631

   > that PR has a very long description. At a glance it definitely seems 
relevant but I can't tell how long it would take to accommodate dates and I'm 
not sure parquet automatically maintains statistics for dates by default.
   
   One issue with #19487 is that it may take longer to land in a release. The 
major challenge is review latency (any help to move it forward sooner would be 
greatly appreciated). Making `date_trunc` prunable itself should be fairly 
straightforward. This rewrite-based approach is also much more likely to get 
merged quickly, so it makes sense to proceed with it if you need a solution in 
the near term.
   
   The gist of #19487 is we let the pruning framework handle arbitrarily 
complex predicates automatically, otherwise we assume the pruner can only 
handle naive exprs like `col < constant`, and we have to maintain dozens of 
rewrite rules like this one, and the rewrite rule is not flexible enough to 
handle slightly more complex patterns. (like it rewrites `date_trunc(part, 
column) <= constant_rhs`, but might fail to prune if we wrap `column` with one 
additional date-related function.) So I believe it's a better long-term 
solution.
   
   > Also, it doesn't seem efficient for this particular optimization because 
it evaluates per batch (micro-partition?) something that can be evaluated once 
up front. I think the only scenario it has equivalent performance is in the 
case of tens of batches and when the plan is not cached in any way.
   
   For the performance part, I think this rewrite do fold one constant value in 
the example (RHS of <=), but the `<=` expr still have to get evaluated on all 
containers anyway. 
   ```
   column <= date_trunc(part, date_add(constant_rhs, INTERVAL 1 part)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to