Hello Bharath Vissapragada, Vihang Karajgaonkar, Sudhanshu Arora, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12991 to look at the new patch set (#2). Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition ...................................................................... IMPALA-8454 (part 2): Initial support for recursive file listing within a partition This adds support to FileMetadataLoader to recursively list a directory and create file descriptors. The changes are as follows: * FileMetadataLoader can now take a 'recursive' argument to trigger the new behavior. All the non-test code paths still use non-recursive (i.e. this new feature isn't exposed for real tables as of yet). * FileSystemUtil has some functionality for recursive directory listing. There are a few notes there around unexpected optimizations for S3 vs HDFS. * Renamed the 'file_name' field to 'relative_path' for FileDescriptor and HDFS splits, since now the file descriptors may be more than a single path component. The new functionality is just unit tested at the moment. Later, this functionality will be used in a couple cases, including: - ability to access "bucketed" tables written by Hive or Spark in a read-only manner. Today we ignore the bucketing and they end up being read as empty tables. - ability to list files inside the hierarchical layout for ACID tables. Fully supporting those use cases will require some other changes (eg to the REFRESH code path which currently assumes that a top-level partition modification timestamp is sufficient to determine if files changed). I'll handle those separately to keep the patches small. We may want to expose recursive listing support for user tables as well (as suggested in IMPALA-4596). However, the global configuration flag suggested in that JIRA doesn't seem so great, so I'm leaving that out for now as well until we can find a more reasonable table-level way to specify it (eg a table property) Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M common/fbs/CatalogObjects.fbs M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 16 files changed, 266 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12991/2 -- To view, visit http://gerrit.cloudera.org:8080/12991 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109 Gerrit-Change-Number: 12991 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sudhanshu Arora <sudhan...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>