[ https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392328#comment-15392328 ]
ASF GitHub Bot commented on DRILL-4786: --------------------------------------- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/553 @jinfengni could you pls review the PR since you reviewed the related PR earlier ? thanks. > Improve metadata cache performance for queries with multiple partitions > ----------------------------------------------------------------------- > > Key: DRILL-4786 > URL: https://issues.apache.org/jira/browse/DRILL-4786 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata, Query Planning & Optimization > Affects Versions: 1.7.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > > Consider queries of the following type run against Parquet data with > metadata caching: > {noformat} > SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3') > {noformat} > For such queries, Drill will read the metadata cache file from the top level > directory 'A', which is not very efficient since we are only interested in > the files from some subdirectories of 'B'. DRILL-4530 improves the > performance of such queries when the leaf level directory is a single > partition. Here, there are 3 subpartitions due to the IN list. We can > build upon the DRILL-4530 enhancement by at least reading the cache file from > the immediate parent level `/A/B` instead of the top level. > The goal of this JIRA is to improve performance for such types of queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)