[ https://issues.apache.org/jira/browse/DRILL-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalii Diravka updated DRILL-6465: ----------------------------------- Fix Version/s: (was: Future) 1.16.0 > Transitive closure is not working in Drill for Join with multiple local > conditions > ---------------------------------------------------------------------------------- > > Key: DRILL-6465 > URL: https://issues.apache.org/jira/browse/DRILL-6465 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.14.0 > Reporter: Denys Ordynskiy > Assignee: Vitalii Diravka > Priority: Minor > Fix For: 1.16.0 > > Attachments: drill.zip > > > For several SQL operators Transitive closure is not working during Partition > Pruning and Filter Pushdown for the left table in Join. > If I use several local conditions, then Drill scans full left table in Join. > But if we move additional conditions to the WHERE statement, then Transitive > closure works fine for all joined tables > *Query BETWEEN:* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM hive.`h_tab1` t1 > JOIN hive.`h_tab2` t2 > ON t1.y=t2.y > AND t2.y BETWEEN 1987 AND 1988; > {code} > *Expected result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=8, partitions= [Partition(values:[1987, 5, 1]), > Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), > Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), > Partition(values:[1988, 12, 2])]{code} > *Actual result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=16, partitions= [Partition(values:[1987, 5, > 1]), Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2]), Partition(values:[1988, 11, 1]), > Partition(values:[1988, 11, 2]), Partition(values:[1988, 12, 1]), > Partition(values:[1988, 12, 2]), Partition(values:[1990, 4, 1]), > Partition(values:[1990, 4, 2]), Partition(values:[1990, 5, 1]), > Partition(values:[1990, 5, 2]), Partition(values:[1991, 3, 1]), > Partition(values:[1991, 3, 2]), Partition(values:[1991, 3, 3]), > Partition(values:[1991, 3, 4]) > ] > {code} > *There is the same Transitive closure behavior for this logical operators:* > * NOT IN > * LIKE > * NOT LIKE > Also Transitive closure is not working during Partition Pruning and Filter > Pushdown for this comparison operators: > *Query <* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM hive.`h_tab1` t1 > JOIN hive.`h_tab2` t2 > ON t1.y=t2.y > AND t2.y < 1988; > {code} > *Expected result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), > Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2])]{code} > *Actual result:* > {code:java} > 00-00 Screen > 00-01 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], > category0=[$5], y0=[$6], m0=[$7]) > 00-02 Project(itm=[$0], y=[$1], m=[$2], category=[$3], itm0=[$4], > category0=[$5], y0=[$6], m0=[$7]) > 00-03 HashJoin(condition=[=($1, $6)], joinType=[inner]) > 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, > tableName:h_tab1), columns=[`**`], numPartitions=16, partitions= > [Partition(values:[1987, 5, 1]), Partition(values:[1987, 5, 2]), > Partition(values:[1987, 7, 1]), Partition(values:[1987, 7, 2]), > Partition(values:[1988, 11, 1]), Partition(values:[1988, 11, 2]), > Partition(values:[1988, 12, 1]), Partition(values:[1988, 12, 2]), > Partition(values:[1990, 4, 1]), Partition(values:[1990, 4, 2]), > Partition(values:[1990, 5, 1]), Partition(values:[1990, 5, 2]), > Partition(values:[1991, 3, 1]), Partition(values:[1991, 3, 2]), > Partition(values:[1991, 3, 3]), Partition(values:[1991, 3, 4])], > inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/1, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/2, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/3, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/4, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/5, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/6, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/7, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/8, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/9, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/10, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/11, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/12, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/13, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/14, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/15, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab1/16]]]) > 00-04 Project(itm0=[$0], category0=[$1], y0=[$2], m0=[$3], $f4=[$4]) > 00-06 Project(itm=[$0], category=[$1], y=[$2], m=[$3], $f4=[<($2, 1988)]) > 00-07 Scan(groupscan=[HiveScan [table=Table(dbName:default, > tableName:h_tab2), columns=[`itm`, `category`, `y`, `m`], numPartitions=2, > partitions= [Partition(values:[1987, 5]), Partition(values:[1987, 7])], > inputDirectories=[maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab2/1, > maprfs:/drill/testdata/ctas/parquet/DRILL_6173/tab2/2]]]) > {code} > *There is the same Transitive closure behavior for this comparison > operators:*** > >, <=, >=, <> > But if I use the WHERE clause instead of local Join condition, Transitive > closure works correct for all joined tables. > *Query:* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM hive.`h_tab1` t1 > JOIN hive.`h_tab2` t2 > ON t1.y=t2.y > WHERE t2.y < 1988; > {code} > *Result:* > {code:java} > Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h_tab1), > columns=[`**`], numPartitions=4, partitions= [Partition(values:[1987, 5, 1]), > Partition(values:[1987, 5, 2]), Partition(values:[1987, 7, 1]), > Partition(values:[1987, 7, 2])] > {code} > > It was examples for Drill Partition Pruning queries with hive. The same > issues are in the Drill Filter Pushdown queries. > *Filter Pushdown 'BETWEEN' query example:* > {code:java} > EXPLAIN PLAN FOR > SELECT * FROM dfs.ctas_parquet.`/DRILL_6173_pushdown/tab1` t1 > JOIN dfs.ctas_parquet.`/DRILL_6173_pushdown/tab2` t2 > ON t1.y=t2.y > AND t2.y BETWEEN 1987 AND 1988; > {code} > *Expected result:* > {code:java} > Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_7.parquet], > ReadEntryWithPath > ... > ReadEntryWithPath > [path=/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1/0_0_6.parquet]], > selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, > numFiles=8, numRowGroups=8 > {code} > *Actual result:* > {code:java} > Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1]], > selectionRoot=maprfs:/drill/testdata/ctas/parquet/DRILL_6173_pushdown/tab1, > numFiles=1, numRowGroups=16 > {code} > > Issues can be reproduced using parquet files in the attachment "drill.zip" > -- This message was sent by Atlassian JIRA (v7.6.3#76005)