The general idea of multi-phase pruning makes sense to me. I am wondering,
though, are we referring to introducing a new planning phase before the
logical or separating out the logic so as to make directory pruning kick
off ahead of column partitioning?

2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.meh...@gmail.com>:

> As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996>
> Jinfeng mentioned that he plans to move the directory based pruning rule
> earlier than column based pruning. I want to expand on that a little,
> provide the motivation and gather thoughts/ feedback.
>
> Currently both the directory based pruning and the column based pruning is
> fired in the same planning phase and are based on Drill logical rels. This
> is not optimal in the case where data is organized in such a way that both
> directory based pruning and column based pruning can be applied (when the
> data is organized with a nested directory structure plus the individual
> files contain partition columns). As part of creating the Drill logical
> scan we read the footers of all the files involved. If the directory based
> pruning rule is fired earlier (rule to fire based on calcite logical rels)
> then we will be able to prune out unnecessary directories and save the work
> of reading the footers of these files.
>
> Thanks
> Mehant
>
>

Reply via email to