[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334300#comment-15334300 ]
ASF GitHub Bot commented on DRILL-4530: --------------------------------------- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67392462 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java --- @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, FileSystemPlugin fsPlugin, String storageEngineName, String userName) throws IOException { // TODO: we only check the first file for directory reading. - if(selection.containsDirectories(fs)){ - if(isDirReadable(fs, selection.getFirstPath(fs))){ + if(selection.containsDirectories(fs)) { + Path dirMetaPath = new Path(selection.getSelectionRoot(), Metadata.METADATA_DIRECTORIES_FILENAME); + if (fs.exists(dirMetaPath)) { + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString()); + if (mDirs.getDirectories().size() > 0) { + FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection); + dirSelection.setExpandedPartial(); + return new DynamicDrillTable(fsPlugin, storageEngineName, userName, --- End diff -- make sense.. I will add a comment. thanks for reviewing. > Improve metadata cache performance for queries with single partition > --------------------------------------------------------------------- > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization > Affects Versions: 1.6.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)