Jason Altekruse created DRILL-2173:
--------------------------------------

             Summary: Enable querying partition information without reading all 
data
                 Key: DRILL-2173
                 URL: https://issues.apache.org/jira/browse/DRILL-2173
             Project: Apache Drill
          Issue Type: New Feature
          Components: Query Planning & Optimization
    Affects Versions: 0.7.0
            Reporter: Jason Altekruse
            Assignee: Jason Altekruse


When reading a series of files in nested directories, Drill currently adds 
columns representing the directory structure that was traversed to reach the 
file currently being read. These columns are stored as varchar under tha names 
dir0, dir1, ...  As these are just regular columns, Drill allows arbitrary 
queries against this data, in terms of aggregates, filter, sort, etc. To allow 
optimizing reads, basic partition pruning has already been added to prune in 
the case of an expression like dir0 = "2015" or a simple in list, which is 
converted during planning to a series of ORs of equals expressions. If users 
want to query the directory information dynamically, and not include specific 
directory names in the query, this will prompt a full table scan and filter 
operation on the dir columns. This enhancement is to allow more complex queries 
to be run against directory metadata, and only scanning the matching 
directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to