Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

via GitHub Fri, 04 Apr 2025 09:24:41 -0700


adriangb commented on code in PR #15561:
URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027673271



##########
datafusion/sqllogictest/test_files/parquet.slt:
##########
@@ -625,7 +625,7 @@ physical_plan
 01)CoalesceBatchesExec: target_batch_size=8192
 02)--FilterExec: column1@0 LIKE f%
 03)----RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1
-04)------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/foo.parquet]]},
 projection=[column1], file_type=parquet, predicate=column1@0 LIKE f%, 
pruning_predicate=column1_null_count@2 != row_count@3 AND column1_min@0 <= g 
AND f <= column1_max@1, required_guarantees=[]
+04)------DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/foo.parquet]]},
 projection=[column1], file_type=parquet, predicate=column1@0 LIKE f%

Review Comment:
   Yes I think we can do that. I feared that it would be more confusing because 
the pruning predicate you see is not what you get in the end...
   
   Is there any way we can inject this information at runtime? Metrics already 
kind of do that. It'd be nice to record the per-file pruning predicates, per 
file schema mappings and per-file filters once those exist.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

Reply via email to