Makes sense. Is there we can do this with lazy materializations rather than writing complex expression tree logic? I hate have no all this custom expression tree manipulation logic.
Also, it seems like this should be N phased rather than two phase where N is the number of directories below the base path. Thoughts? On Sep 9, 2015 10:54 AM, "Aman Sinha" <amansi...@apache.org> wrote: > Currently, partition pruning gets all file names in the table and applies > the pruning. Suppose the files are spread out over several directories and > there is a filter on dirN, this is not efficient - both in terms of > elapsed time and memory usage. This has been seen in a few use cases > recently. > > We should ideally perform the pruning in 2 steps: first get the top-level > directory names only and apply the directory filter, then get the filenames > within that directory and apply remaining filters. > > I will create a JIRA for this enhancement but let me know your thoughts... > > Aman >