Hi Aman, I've also created a second issue for the invalid 0 length parquet files not being pruned out:
https://issues.apache.org/jira/browse/DRILL-2517 I've done a bit of work on resolving it but need some input to see if I'm going down the right path. On Mon, Mar 23, 2015 at 12:54 PM, Aman Sinha <[email protected]> wrote: > Hi Adam, > I will update DRILL-2287 with some comments because it has more context > than this discussion thread. We can continue the discussion there. The > issue of the invalid 0 length parquet files being read sounds like a > different issue. > > Aman > > On Sun, Mar 22, 2015 at 6:48 PM, Adam Gilmore <[email protected]> > wrote: > > > Hi guys, > > > > I'm trying to work on an issue I've raised with partition pruning: > > > > https://issues.apache.org/jira/browse/DRILL-2287 > > > > Basically, because the partition pruning is done after the > > DrillPushProjIntoScan, it seems like we can't detect that dir0 (for > > example) is not actually needed to be projected if it's not in the SELECT > > clause (or GROUP BY etc.). > > > > Moreover, I've come up with an issue whereby if I have, for example, 3 > > directories - 2 with valid Parquet files and 1 with an invalid 0-byte > > Parquet file, even if we're partition pruning to only the valid > > directories, the query will fail (because it's trying to read the footer > of > > the invalid Parquet file). > > > > It really feels like the partition pruning should be done before the > > DrillPushProjIntoScan. > > > > I know Jacques has just done some work on moving the partition pruning, > so > > I thought I'd open the discussion here first before making too many > > in-roads into it. > > > > I do feel if we're partition pruning, we shouldn't even try to read any > of > > those other directories during the planning stage. Furthermore, it > doesn't > > make sense to prune the files being scanned but still keep a Filter > > operation in the query plan and project dir0 throughout it if it's not > > needed. The latter is why the queries end up being a lot slower. > > > > Thoughts? > > >
