Ben Kietzman created ARROW-8658:
-----------------------------------

             Summary: [C++][Dataset] Implement subtree pruning for 
FileSystemDataset::GetFragments
                 Key: ARROW-8658
                 URL: https://issues.apache.org/jira/browse/ARROW-8658
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 0.17.0
            Reporter: Ben Kietzman
            Assignee: Ben Kietzman
             Fix For: 1.0.0


This is a very handy optimization for large datasets with multiple partition 
fields. For example, given a hive-style directory {{$base_dir/a=3/}} and a 
filter {{"a"_ == 2}} none of its files or subdirectories need be examined.

After ARROW-8318 FileSystemDataset stores only files so subtree pruning (whose 
implementation depended on the presence of directories to represent subtrees) 
was disabled. It should be possible to reintroduce this without reference to 
directories by examining partition expressions directly and extracting a tree 
structure from their subexpressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to