[ https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394131#comment-15394131 ]
ASF GitHub Bot commented on DRILL-4786: --------------------------------------- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/553#discussion_r72296459 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -387,16 +378,35 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR condition = condition.accept(reverseVisitor); pruneCondition = pruneCondition.accept(reverseVisitor); - if (checkForSingle && isSinglePartition && !wasAllPartitionsPruned) { + if (descriptor.supportsMetadataCachePruning() && !wasAllPartitionsPruned) { // if metadata cache file could potentially be used, then assign a proper cacheFileRoot - String path = ""; - for (int j = 0; j <= maxIndex; j++) { - path += "/" + spInfo[j]; + int index = -1; + if (!matchBitSet.isEmpty()) { + String path = ""; + index = matchBitSet.length() - 1; + + for (int j = 0; j < matchBitSet.length(); j++) { + if (!matchBitSet.get(j)) { + // stop at the first index with no match and use the immediate + // previous index + index = j-1; + break; + } + } + for (int j=0; j <= index; j++) { + path += "/" + spInfo[j]; + } + cacheFileRoot = descriptor.getBaseTableLocation() + path; --- End diff -- cacheFileRoot is set within 'IF' branch. Are we going to get a null for cacheFileRoot, if matchBitSet has no bit set? Will cacehFileRoot=null cause issue in downstream logic? > Improve metadata cache performance for queries with multiple partitions > ----------------------------------------------------------------------- > > Key: DRILL-4786 > URL: https://issues.apache.org/jira/browse/DRILL-4786 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata, Query Planning & Optimization > Affects Versions: 1.7.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > > Consider queries of the following type run against Parquet data with > metadata caching: > {noformat} > SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3') > {noformat} > For such queries, Drill will read the metadata cache file from the top level > directory 'A', which is not very efficient since we are only interested in > the files from some subdirectories of 'B'. DRILL-4530 improves the > performance of such queries when the leaf level directory is a single > partition. Here, there are 3 subpartitions due to the IN list. We can > build upon the DRILL-4530 enhancement by at least reading the cache file from > the immediate parent level `/A/B` instead of the top level. > The goal of this JIRA is to improve performance for such types of queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)