[ https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229785#comment-15229785 ]
ASF GitHub Bot commented on DRILL-4589: --------------------------------------- Github user hsuanyi commented on a diff in the pull request: https://github.com/apache/drill/pull/468#discussion_r58824977 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSDirPartitionLocation.java --- @@ -0,0 +1,70 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Class defines a single partition corresponding to a directory in a DFS table. + */ +package org.apache.drill.exec.planner; + + +import com.google.common.collect.Lists; + +import java.util.Collection; +import java.util.List; + +public class DFSDirPartitionLocation implements PartitionLocation { + private final Collection<PartitionLocation> subPartitions; + private final String[] dirs; + + public DFSDirPartitionLocation(String[] dirs, Collection<PartitionLocation> subPartitions) { + this.subPartitions = subPartitions; + this.dirs = dirs; + } + + @Override + public String getPartitionValue(int index) { + assert index < dirs.length; --- End diff -- I think the next line will throw IOOB if this line is not satisfied. (But this is minor thing). > Reduce planning time for file system partition pruning by reducing filter > evaluation overhead > --------------------------------------------------------------------------------------------- > > Key: DRILL-4589 > URL: https://issues.apache.org/jira/browse/DRILL-4589 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > > When Drill is used to query hundreds of thousands, or even millions of files > organized into multi-level directories, user typically will provide a > partition filter like : dir0 = something and dir1 = something2 and .. . > For such queries, we saw the query planning time could be unacceptable long, > due to three main overheads: 1) to expand and get the list of files, 2) to > evaluate the partition filter, 3) to get the metadata, in the case of parquet > files for which metadata cache file is not available. > DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after > DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the > partition filter evaluation is applied to file level. In many cases, we saw > that the number of leaf subdirectories is significantly lower than that of > files. Since all the files under the same leaf subdirecctory share the same > directory metadata, we should apply the filter evaluation at the leaf > subdirectory. By doing that, we could reduce the cpu overhead to evaluate the > filter, and the memory overhead as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)