[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336826#comment-15336826 ]
ASF GitHub Bot commented on DRILL-4530: --------------------------------------- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67567993 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; - // Inner loop: within each batch iterate over the PartitionLocations - for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ - newPartitions.add(part); - qualifiedCount++; + if (checkForSingle && + partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { + assert part.isCompositePartition(); + if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition + // compose the array of partition values for the directories that are referenced by filter: + // e.g suppose the dir hierarchy is year/quarter/month and the query is: + // SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', + // then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] + // Note that we are not using the PartitionLocation here but composing a different list because + // we are only interested in the directory columns that are referenced in the filter condition. not + // the SELECT list or other parts of the query. + Pair<String[], Integer> p = composePartition(referencedDirsBitSet, partitionMap, vectors, recordCount); + String[] parts = p.getLeft(); + int tmpIndex = p.getRight(); + if (spInfo == null) { + spInfo = parts; + maxIndex = tmpIndex; + } else if (maxIndex != tmpIndex) { + isSinglePartition = false; + break; + } else { + // we only want to compare until the maxIndex inclusive since subsequent values would be null + for (int j = 0; j <= maxIndex; j++) { + if (spInfo[j] == null // prefixes should be non-null --- End diff -- Form Line 305-306, spInfo and maxIndex are in sync. Why will we have spInfo[j] == null, when j <= maxIndex? I thought maxIndex is obtained such that element in spInfo is not null. > Improve metadata cache performance for queries with single partition > --------------------------------------------------------------------- > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization > Affects Versions: 1.6.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)