As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996> Jinfeng mentioned that he plans to move the directory based pruning rule earlier than column based pruning. I want to expand on that a little, provide the motivation and gather thoughts/ feedback.

Currently both the directory based pruning and the column based pruning is fired in the same planning phase and are based on Drill logical rels. This is not optimal in the case where data is organized in such a way that both directory based pruning and column based pruning can be applied (when the data is organized with a nested directory structure plus the individual files contain partition columns). As part of creating the Drill logical scan we read the footers of all the files involved. If the directory based pruning rule is fired earlier (rule to fire based on calcite logical rels) then we will be able to prune out unnecessary directories and save the work of reading the footers of these files.

Thanks
Mehant

Reply via email to