Hi Adam,
I will update DRILL-2287 with some comments because it has more context
than this discussion thread.  We can continue the discussion there.  The
issue of the invalid 0 length parquet files being read sounds like a
different issue.

Aman

On Sun, Mar 22, 2015 at 6:48 PM, Adam Gilmore <[email protected]> wrote:

> Hi guys,
>
> I'm trying to work on an issue I've raised with partition pruning:
>
> https://issues.apache.org/jira/browse/DRILL-2287
>
> Basically, because the partition pruning is done after the
> DrillPushProjIntoScan, it seems like we can't detect that dir0 (for
> example) is not actually needed to be projected if it's not in the SELECT
> clause (or GROUP BY etc.).
>
> Moreover, I've come up with an issue whereby if I have, for example, 3
> directories - 2 with valid Parquet files and 1 with an invalid 0-byte
> Parquet file, even if we're partition pruning to only the valid
> directories, the query will fail (because it's trying to read the footer of
> the invalid Parquet file).
>
> It really feels like the partition pruning should be done before the
> DrillPushProjIntoScan.
>
> I know Jacques has just done some work on moving the partition pruning, so
> I thought I'd open the discussion here first before making too many
> in-roads into it.
>
> I do feel if we're partition pruning, we shouldn't even try to read any of
> those other directories during the planning stage.  Furthermore, it doesn't
> make sense to prune the files being scanned but still keep a Filter
> operation in the query plan and project dir0 throughout it if it's not
> needed.  The latter is why the queries end up being a lot slower.
>
> Thoughts?
>

Reply via email to