[ https://issues.apache.org/jira/browse/DRILL-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185043#comment-15185043 ]
Deneche A. Hakim commented on DRILL-4449: ----------------------------------------- I was able to create a reproduction of the issue, in case it's later needed for validation: create a partitioned table: {noformat} CREATE TABLE dfs.tmp.t PARTITION BY(l_discount) AS SELECT * FROM cp.`tpch/lineitem.parquet`; {noformat} The following query will give wrong results if the table has a metadata cache file: {noformat} SELECT COUNT(*) FROM ( SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount < 0.05 UNION ALL SELECT l_orderkey FROM dfs.tmp.t WHERE l_discount > 0.02 ); {noformat} > Wrong results when using metadata cache with specific set of queries > -------------------------------------------------------------------- > > Key: DRILL-4449 > URL: https://issues.apache.org/jira/browse/DRILL-4449 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.5.0 > Reporter: Deneche A. Hakim > Assignee: Deneche A. Hakim > Priority: Critical > Fix For: 1.6.0 > > > We are still working on a reproduction but when we have a query similar to > this one: > {noformat} > with q1 as ( > select a.field > from `table` a > where <some condition that causes the table to be pruned> > group by a.field > having ... > ) > , q2 as ( > select a.field > from `table` a > where <some other pruning condition> > group by a.field > ) > select * from ( > select count(*) as cnt from q1 > union all > select count(*) as cnt from q2 > ); > {noformat} > The table is partitioned and both sub queries will force a parquet pruning on > the table. Because we share the parquet metadata object in ParquetGroupScan, > the second query end up being "over pruned" and we get wrong results. > The plan doesn't show the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)