Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11182 to look at the new patch set (#5). Change subject: IMPALA-7436: initial fetch-from-catalogd implementation ...................................................................... IMPALA-7436: initial fetch-from-catalogd implementation This patch adds a new RPC to the catalogd which allows a client to fetch a partial view of table or database metadata. Various subsets of information can be specified and are sent back in fairly "raw" format. A new MetaProvider implementation is added which uses this API to support granular fetching of metadata into the impalad. The interface had to be reworked in a few ways to support this: - This API uses partition IDs instead of names to specify them. So, the listPartitions API now returns opaque PartitionRefs which are passed back to the MetaProvider when loading more partition details. The new implementation stores the IDs in these refs while the direct-to-HMS implementation just uses names. - The fetching of file descriptors was merged into the loading of other partition metadata. I couldn't think of any cases where we needed to list partition details without also fetching the file descriptors so it simplified things a bit to merge the two. This was a lot easier to implement for CatalogdMetaProvider since the file metadata is stored by partition rather than looked up by a directory as in the previous API. This necessitated moving some of the logic out of LocalFsTable into DirectMetaProvider, so LocalFsTable no longer deals directly with HDFS APIs like FileStatus. - The handling of "default partition" for an unpartitioned table moved into the MetaProvider implementations itself instead of LocalFsTable. This is because the CatalogdProvider sees the "default partition" as a partition that actually has an identifier on the catalogd, whereas the DirectMetaProvider does not. So, now both providers export the "default partition" as a partition like all the others. This patch also starts to address one of the potential semantic risks of partial caching on the impalad. If one query fetches some subset of partitions, then a DDL occurs to change the table metadata, and another query is submitted, we want to ensure that the metadata for the latter query still reads a consistent snapshot. In other words, we need to ensure that the metadata like partition list and table schema come from the same snapshot as the finer-grained metadata like partition contents. In order to implement this, the MetadataProvider API now requires that callers use a 'TableRef' object to specify the table to be read, instead of the dbName/tableName. In the DirectMetaProvider we don't have any convenient version numbers for a table, so the TableRef just encapsulates the naming. In the CatalogdMetaProvider, we additionally store the version number of the table, and then all subsequent requests verify that the version number has not changed. If it detects a concurrent modification, an exception is thrown. The frontend catches this exception and triggers a "re-plan". I tested this "replan" manually for now by running queries against a table in a loop from my shell while issuing concurrent 'refresh' queries to that same table. I verified that the warning log indicated the query had been replanned but the query itself was not disrupted. I filed IMPALA-7438 to add an automated version of this test. Change-Id: If49207fc592b1cc552fbcc7199568b6833f86901 --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-service-client-wrapper.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/service/fe-support.cc M common/fbs/CMakeLists.txt M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java A fe/src/main/java/org/apache/impala/catalog/local/InconsistentMetadataFetchException.java M fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalHbaseTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalPartitionSpec.java M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalView.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java 36 files changed, 1,578 insertions(+), 285 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/11182/5 -- To view, visit http://gerrit.cloudera.org:8080/11182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If49207fc592b1cc552fbcc7199568b6833f86901 Gerrit-Change-Number: 11182 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>