[jira] [Commented] (IMPALA-3127) Decouple partitions from tables
[ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844279#comment-17844279 ] ASF subversion and git services commented on IMPALA-3127: - Commit 5d32919f46117213249c60574f77e3f9bb66ed90 in impala's branch refs/heads/branch-4.4.0 from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d32919f4 ] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Reviewed-on: http://gerrit.cloudera.org:8080/21326 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins (cherry picked from commit ee21427d26620b40d38c706b4944d2831f84f6f5) > Decouple partitions from tables > --- > > Key: IMPALA-3127 > URL: https://issues.apache.org/jira/browse/IMPALA-3127 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Quanlong Huang >Priority: Major > Labels: catalog-server, performance > Fix For: Impala 4.0.0 > > > Currently, partitions are tightly integrated into the HdfsTable objects, > making incremental metadata updates difficult to perform. Furthermore, the > catalog transmits entire table metadata even when only few partitions change, > introducing significant latencies, wasting network bandwidth and CPU cycles > while updating table metadata at the receiving impalads. As a first step, we > should decouple partitions from tables and add them as a separate level in > the hierarchy of catalog entities (server-db-table-partition). Subsequently, > the catalog should transmit only entities that have changed after DDL/DML > statements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3127) Decouple partitions from tables
[ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844100#comment-17844100 ] ASF subversion and git services commented on IMPALA-3127: - Commit ee21427d26620b40d38c706b4944d2831f84f6f5 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ee21427d2 ] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Reviewed-on: http://gerrit.cloudera.org:8080/21326 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Decouple partitions from tables > --- > > Key: IMPALA-3127 > URL: https://issues.apache.org/jira/browse/IMPALA-3127 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Quanlong Huang >Priority: Major > Labels: catalog-server, performance > Fix For: Impala 4.0.0 > > > Currently, partitions are tightly integrated into the HdfsTable objects, > making incremental metadata updates difficult to perform. Furthermore, the > catalog transmits entire table metadata even when only few partitions change, > introducing significant latencies, wasting network bandwidth and CPU cycles > while updating table metadata at the receiving impalads. As a first step, we > should decouple partitions from tables and add them as a separate level in > the hierarchy of catalog entities (server-db-table-partition). Subsequently, > the catalog should transmit only entities that have changed after DDL/DML > statements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3127) Decouple partitions from tables
[ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166239#comment-17166239 ] ASF subversion and git services commented on IMPALA-3127: - Commit 074731e2bcf37643710f2fdf236829991a462fc3 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=074731e ] IMPALA-3127: Support incremental metadata updates in partition level Currently, partitions are tightly integrated into the HdfsTable objects. Catalogd has to transmit the entire table metadata even when few partitions change. This is a waste of resources and can lead to OOM in transmitting large tables due to the 2GB JVM array limit. This patch makes HdfsPartition extend CatalogObject so the catalogd can send partitions as individual catalog objects. Consequently, table objects in the catalog topic update can have minimal partition maps that only contain the partition ids, which reduces the thrift object size for large tables. The catalog object key of HdfsPartition consists of db name, table name and partition name. In "full" topic mode (catalog_topic_mode=full), catalogd only sends changed partitions with their latest table states. The latest table states are table objects with the minimal partition map. Legacy coordinators use the partition list to pick up existing (unchanged) partitions from the existing table object and new partitions in the catalog update. Currently, partition instances are immutable - all partition modifications are implemented by deleting the old instance and adding a new one with a new partition id. Since partition ids are generated by a global counter. Newer partition instances will have larger partition ids. So catalogd maintains a watermark for each table as the max sent partition id. Partition instances with ids larger than this are new partitions that should be sent in the next catalog update. For the deleted partition instances, they are kept in a set for each table until the next catalog update. If there are no updates on the same partition name, catalogd will send deletion on the partition. For dropped or invalidated tables, catalogd will still send deletions on their partitions. Although they are not used in coordinators (coordinators delete the partitions when they delete the table instances), they help in avoiding topic entry leak in the statestore catalog topic. In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only sends invalidations on tables and stale partition instances. Each partition instance is identified by its partition id. LocalCatalog coordinators use the partition invalidations to evict stale partitions in time. For instance, let's say partition(year=2010) is updated in catalogd. This is done by deleting the old partition instance partition(id=0, year=2010) and adding a new partition instance partition(id=1, year=2010). Catalogd will send invalidations on the table and partition instance with id=0, but not the one with id=1. A LocalCatalog coordinator will invalidate the partition instance(id=0) if it's in the cache. If the partition instance(id=1) is cached, it's already the latest version since partition instances are immutable. So we don't need to invalidate it. Tests - Run exhaustive tests. - Run exhaustive test_ddl.py in LocalCatalog mode. - Add test in test_local_catalog.py to verify stale partitions are invalidated in LocalCatalog when partitions are updated. Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870 Reviewed-on: http://gerrit.cloudera.org:8080/16159 Tested-by: Impala Public Jenkins Reviewed-by: Vihang Karajgaonkar > Decouple partitions from tables > --- > > Key: IMPALA-3127 > URL: https://issues.apache.org/jira/browse/IMPALA-3127 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Quanlong Huang >Priority: Major > Labels: catalog-server, performance > > Currently, partitions are tightly integrated into the HdfsTable objects, > making incremental metadata updates difficult to perform. Furthermore, the > catalog transmits entire table metadata even when only few partitions change, > introducing significant latencies, wasting network bandwidth and CPU cycles > while updating table metadata at the receiving impalads. As a first step, we > should decouple partitions from tables and add them as a separate level in > the hierarchy of catalog entities (server-db-table-partition). Subsequently, > the catalog should transmit only entities that have changed after DDL/DML > statements. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (IMPALA-3127) Decouple partitions from tables
[ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136742#comment-17136742 ] ASF subversion and git services commented on IMPALA-3127: - Commit 419aa2e30db326f02e9b4ec563ef7864e82df86e in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=419aa2e ] IMPALA-9778: Refactor partition modifications in DDL/DMLs After this patch, in DDL/DMLs that update metadata of partitions, instead of updating partitions in place, we always create new ones and use them to replace the existing instances. This is guarded by making HdfsPartition immutable. There are several benefits for this: - HdfsPartition can be shared across table versions. In full catalog update mode, catalog update can ignore unchanged partitions (IMPALA-3234) and send the update in partition granularity. - Aborted DDL/DMLs won't leave partition metadata in a bad shape (e.g. IMPALA-8406), which usually requires invalidation to recover. - Fetch-on-demand coordinators can cache partition meta using the partition id as the key. When table version updates, only metadata of changed partitions need to be reloaded (IMPALA-7533). - In the work of decoupling partitions from tables (IMPALA-3127), we don't need to assign a catalog version to partitions since the partition ids already identify the partitions. However, HdfsPartition is not strictly immutable. Although all its fields are final, some fields are still referencing mutable objects. We need more refactoring to achieve this. This patch focuses on refactoring the DDL/DML code paths. Changes: - Make all fields of HdfsPartition final. Move HdfsPartition constructor logics and all its update methods into HdfsPartition.Builder. - Refactor in-place updates on HdfsPartition to be creating a new one and dropping the old one. HdfsPartition.Builder represents the in-progress modifications. Once all modifications are done, call its build() method to create the new HdfsPartition instance. The old HdfsPartition instance is only replaced at the end of the modifications. - Move the "dirty" marker of HdfsPartition into a map of HdfsTable. It maps from the old partition id to the in-progress partition builder. For "dirty" partitions, we’ll reload its HMS meta and file meta. Tests: - No new tests are added since the existing tests already provide sufficient coverage - Run CORE tests Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c Reviewed-on: http://gerrit.cloudera.org:8080/15985 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Decouple partitions from tables > --- > > Key: IMPALA-3127 > URL: https://issues.apache.org/jira/browse/IMPALA-3127 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: catalog-server, performance > > Currently, partitions are tightly integrated into the HdfsTable objects, > making incremental metadata updates difficult to perform. Furthermore, the > catalog transmits entire table metadata even when only few partitions change, > introducing significant latencies, wasting network bandwidth and CPU cycles > while updating table metadata at the receiving impalads. As a first step, we > should decouple partitions from tables and add them as a separate level in > the hierarchy of catalog entities (server-db-table-partition). Subsequently, > the catalog should transmit only entities that have changed after DDL/DML > statements. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3127) Decouple partitions from tables
[ https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009969#comment-17009969 ] Vihang Karajgaonkar commented on IMPALA-3127: - I would like to take this up. > Decouple partitions from tables > --- > > Key: IMPALA-3127 > URL: https://issues.apache.org/jira/browse/IMPALA-3127 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.2.4 >Reporter: Dimitris Tsirogiannis >Priority: Major > Labels: catalog-server, performance > > Currently, partitions are tightly integrated into the HdfsTable objects, > making incremental metadata updates difficult to perform. Furthermore, the > catalog transmits entire table metadata even when only few partitions change, > introducing significant latencies, wasting network bandwidth and CPU cycles > while updating table metadata at the receiving impalads. As a first step, we > should decouple partitions from tables and add them as a separate level in > the hierarchy of catalog entities (server-db-table-partition). Subsequently, > the catalog should transmit only entities that have changed after DDL/DML > statements. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org