nuno-faria commented on code in PR #22031:
URL: https://github.com/apache/datafusion/pull/22031#discussion_r3191590257
##########
datafusion/sqllogictest/test_files/explain_analyze.slt:
##########
@@ -247,7 +247,7 @@ explain analyze select * from cat_tracking where species >
'M' AND s >= 50 order
----
Plan with Metrics
01)SortExec: TopK(fetch=3), expr=[species@0 ASC NULLS LAST],
preserve_partitioning=[false], filter=[species@0 < Nlpine Sheep],
metrics=[output_rows=3]
-02)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/explain_analyze/data.parquet]]},
projection=[species, s], file_type=parquet, predicate=species@0 > M AND s@1 >=
50 AND DynamicFilter [ species@0 < Nlpine Sheep ],
pruning_predicate=species_null_count@1 != row_count@2 AND species_max@0 > M AND
s_null_count@4 != row_count@2 AND s_max@3 >= 50 AND species_null_count@1 !=
row_count@2 AND species_min@5 < Nlpine Sheep, required_guarantees=[],
metrics=[output_rows=3, files_ranges_pruned_statistics=1 total → 1 matched,
row_groups_pruned_statistics=4 total → 3 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=3 total → 3 matched, page_index_pages_pruned=6
total → 6 matched, limit_pruned_row_groups=0 total → 0 matched,
scan_efficiency_ratio=22.13% (521/2.35 K)]
+02)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/explain_analyze/data.parquet]]},
projection=[species, s], file_type=parquet, predicate=species@0 > M AND s@1 >=
50 AND DynamicFilter [ species@0 < Nlpine Sheep ],
pruning_predicate=species_null_count@1 != row_count@2 AND species_max@0 > M AND
s_null_count@4 != row_count@2 AND s_max@3 >= 50 AND species_null_count@1 !=
row_count@2 AND species_min@5 < Nlpine Sheep, required_guarantees=[],
metrics=[output_rows=3, files_ranges_pruned_statistics=1 total → 1 matched,
row_groups_pruned_statistics=4 total → 3 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=3 total → 3 matched, page_index_pages_pruned=3
total → 3 matched, limit_pruned_row_groups=0 total → 0 matched,
scan_efficiency_ratio=22.13% (521/2.35 K)]
Review Comment:
The file used contains 4 row groups (1 data page each) and the query matched
only 3. So the previous 6 was due do double counting by having two predicates.
Also the same reason for the remaining changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]