adriangb commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3893794362

   Okay yeah that's a valid hypothesis. I think things should be optimized 
enough that the overhead would not be that impactful, but maybe it is. I think 
it would be reasonable to some simplifier / check to discard filters that are 
always true.
   
   That said I just tried running Q6:
   
   ```sql
   set datafusion.execution.parquet.binary_as_string = true;
   create external table hits stored as parquet location 
'benchmarks/data/hits_partitioned';
   explain analyze SELECT MIN("EventDate"), MAX("EventDate") FROM hits;
   ```
   
   I'm getting:
   
   ```
   ProjectionExec: expr=[15888 as min(hits.EventDate), 15917 as 
max(hits.EventDate)]
   ```
   
   I.e. we don't even scan the data, we resolve it from statistics 🤔. Am I 
doing something wrong / different than you @notashes ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to