As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996>
Jinfeng mentioned that he plans to move the directory based pruning rule
earlier than column based pruning. I want to expand on that a little,
provide the motivation and gather thoughts/ feedback.
Currently both the directory based pruning and the column based pruning
is fired in the same planning phase and are based on Drill logical rels.
This is not optimal in the case where data is organized in such a way
that both directory based pruning and column based pruning can be
applied (when the data is organized with a nested directory structure
plus the individual files contain partition columns). As part of
creating the Drill logical scan we read the footers of all the files
involved. If the directory based pruning rule is fired earlier (rule to
fire based on calcite logical rels) then we will be able to prune out
unnecessary directories and save the work of reading the footers of
these files.
Thanks
Mehant
- Moving directory based pruning to fire earlier Mehant Baid
-