The general idea of multi-phase pruning makes sense to me. I am wondering, though, are we referring to introducing a new planning phase before the logical or separating out the logic so as to make directory pruning kick off ahead of column partitioning?
2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.meh...@gmail.com>: > As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996> > Jinfeng mentioned that he plans to move the directory based pruning rule > earlier than column based pruning. I want to expand on that a little, > provide the motivation and gather thoughts/ feedback. > > Currently both the directory based pruning and the column based pruning is > fired in the same planning phase and are based on Drill logical rels. This > is not optimal in the case where data is organized in such a way that both > directory based pruning and column based pruning can be applied (when the > data is organized with a nested directory structure plus the individual > files contain partition columns). As part of creating the Drill logical > scan we read the footers of all the files involved. If the directory based > pruning rule is fired earlier (rule to fire based on calcite logical rels) > then we will be able to prune out unnecessary directories and save the work > of reading the footers of these files. > > Thanks > Mehant > >