[ https://issues.apache.org/jira/browse/DRILL-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni resolved DRILL-4250. ------------------------------- Resolution: Fixed Fix Version/s: 1.5.0 Fixed in commit: b9bc35a89208d2dd03f1ed751f71a0cd23651c9a > File system directory-based partition pruning does not work when a directory > contains both subdirectories and files. > ---------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-4250 > URL: https://issues.apache.org/jira/browse/DRILL-4250 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > When a directory contains both subdirectories and files, then the > directory-based partition pruning would not work. > For example, I have the following directory structure with nation.parquet > (copied from tpch sample dataset). > .//2001/Q1/nation.parquet > .//2001/Q2/nation.parquet > The following query has the directory-based partition pruning work correctly. > > {code} > explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 > = 'Q1'; > 00-00 Screen > 00-01 Project(*=[$0]) > 00-02 Project(*=[$0]) > 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet]], > selectionRoot=file:/tmp/fileAndDir, numFiles=1, usedMetadataFile=false, > columns=[`*`]]]) > {code} > However, if I add a nation.parquet file to 2001 directory, like the following: > .//2001/nation.parquet > .//2001/Q1/nation.parquet > .//2001/Q2/nation.parquet > Then, the same query will not have the partition pruning applied. > {code} > explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 > = 'Q1'; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(*=[$0]) > 00-02 Project(T0¦¦*=[$0]) > 00-03 SelectionVectorRemover > 00-04 Filter(condition=[AND(=($1, 2001), =($2, 'Q1'))]) > 00-05 Project(T0¦¦*=[$0], dir0=[$1], dir1=[$2]) > 00-06 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/nation.parquet], > ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet], > ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q2/nation.parquet]], > selectionRoot=file:/tmp/fileAndDir, numFiles=3, usedMetadataFile=false, > columns=[`*`]]]) > {code} > I should note that for the second case where partition pruning did not work, > the query did return the correct result. Therefore, this issue is only impact > the query performance, not the query result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)