[ 
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230667#comment-15230667
 ] 

ASF GitHub Bot commented on DRILL-4589:
---------------------------------------

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/468#discussion_r58912269
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSDirPartitionLocation.java
 ---
    @@ -0,0 +1,70 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +/**
    + * Class defines a single partition corresponding to a directory in a DFS 
table.
    + */
    +package org.apache.drill.exec.planner;
    +
    +
    +import com.google.common.collect.Lists;
    +
    +import java.util.Collection;
    +import java.util.List;
    +
    +public class DFSDirPartitionLocation implements PartitionLocation {
    +  private final Collection<PartitionLocation> subPartitions;
    +  private final String[] dirs;
    +
    +  public DFSDirPartitionLocation(String[] dirs, 
Collection<PartitionLocation> subPartitions) {
    +    this.subPartitions = subPartitions;
    +    this.dirs = dirs;
    +  }
    +
    +  @Override
    +  public String getPartitionValue(int index) {
    +    assert index < dirs.length;
    --- End diff --
    
    this one actually is copied from [1]. I think it makes sense to change both 
to throw exception in stead of relying on assertion check. Will update the 
patch. 
    
    
    
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java#L58


> Reduce planning time for file system partition pruning by reducing filter 
> evaluation overhead
> ---------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4589
>                 URL: https://issues.apache.org/jira/browse/DRILL-4589
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files 
> organized into multi-level directories, user typically will provide a 
> partition filter like  : dir0 = something and dir1 = something2 and .. .  
> For such queries, we saw the query planning time could be unacceptable long, 
> due to three main overheads: 1) to expand and get the list of files, 2) to 
> evaluate the partition filter, 3) to get the metadata, in the case of parquet 
> files for which metadata cache file is not available. 
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after 
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the 
> partition filter evaluation is applied to file level. In many cases, we saw 
> that the number of leaf subdirectories is significantly lower than that of 
> files. Since all the files under the same leaf subdirecctory share the same 
> directory metadata, we should apply the filter evaluation at the leaf 
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the 
> filter, and the memory overhead as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to