[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2024-05-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844279#comment-17844279
 ] 

ASF subversion and git services commented on IMPALA-3127:
-

Commit 5d32919f46117213249c60574f77e3f9bb66ed90 in impala's branch 
refs/heads/branch-4.4.0 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d32919f4 ]

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.
See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta().

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Reviewed-on: http://gerrit.cloudera.org:8080/21326
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit ee21427d26620b40d38c706b4944d2831f84f6f5)


> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-server, performance
> Fix For: Impala 4.0.0
>
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2024-05-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844100#comment-17844100
 ] 

ASF subversion and git services commented on IMPALA-3127:
-

Commit ee21427d26620b40d38c706b4944d2831f84f6f5 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ee21427d2 ]

IMPALA-13009: Fix catalogd not sending deletion updates for some dropped 
partitions

*Background*

Since IMPALA-3127, catalogd sends incremental partition updates based on
the last sent table snapshot ('maxSentPartitionId_' to be specific).
Dropped partitions since the last catalog update are tracked in
'droppedPartitions_' of HdfsTable. When catalogd collects the next
catalog update, they will be collected. HdfsTable then clears the set.
See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta().

If an HdfsTable is invalidated, it's replaced with an IncompleteTable
which doesn't track any partitions. The HdfsTable object is then added
to the deleteLog so catalogd can send deletion updates for all its
partitions. The same if the HdfsTable is dropped. However, the
previously dropped partitions are not collected in this case, which
results in a leak in the catalog topic if the partition name is not
reused anymore. Note that in the catalog topic, the key of a partition
update consists of the table name and the partition name. So if the
partition is added back to the table, the topic key will be reused then
resolves the leak.

The leak will be observed when a coordinator restarts. In the initial
catalog update sent from statestore, coordinator will find some
partition updates that are not referenced by the HdfsTable (assuming the
table is used again after the INVALIDATE). Then a Precondition check
fails and the table is not added to the coordinator.

*Overview of the patch*

This patch fixes the leak by also collecting the dropped partitions when
adding the HdfsTable to the deleteLog. A new field, dropped_partitions,
is added in THdfsTable to collect them. It's only used when catalogd
collects catalog updates.

Removes the Precondition check in coordinator and just reports the stale
partitions since IMPALA-12831 could also introduce them.

Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to
show the dropped partition names for better diagnostics.

Tests
 - Added e2e tests

Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Reviewed-on: http://gerrit.cloudera.org:8080/21326
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-server, performance
> Fix For: Impala 4.0.0
>
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2020-07-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166239#comment-17166239
 ] 

ASF subversion and git services commented on IMPALA-3127:
-

Commit 074731e2bcf37643710f2fdf236829991a462fc3 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=074731e ]

IMPALA-3127: Support incremental metadata updates in partition level

Currently, partitions are tightly integrated into the HdfsTable objects.
Catalogd has to transmit the entire table metadata even when few
partitions change. This is a waste of resources and can lead to OOM in
transmitting large tables due to the 2GB JVM array limit.

This patch makes HdfsPartition extend CatalogObject so the catalogd can
send partitions as individual catalog objects. Consequently, table
objects in the catalog topic update can have minimal partition maps that
only contain the partition ids, which reduces the thrift object size for
large tables. The catalog object key of HdfsPartition consists of db
name, table name and partition name.

In "full" topic mode (catalog_topic_mode=full), catalogd only sends
changed partitions with their latest table states. The latest table
states are table objects with the minimal partition map. Legacy
coordinators use the partition list to pick up existing (unchanged)
partitions from the existing table object and new partitions in the
catalog update.

Currently, partition instances are immutable - all partition
modifications are implemented by deleting the old instance and adding a
new one with a new partition id. Since partition ids are generated by a
global counter. Newer partition instances will have larger partition
ids. So catalogd maintains a watermark for each table as the max sent
partition id. Partition instances with ids larger than this are new
partitions that should be sent in the next catalog update. For the
deleted partition instances, they are kept in a set for each table until
the next catalog update. If there are no updates on the same partition
name, catalogd will send deletion on the partition.

For dropped or invalidated tables, catalogd will still send deletions on
their partitions. Although they are not used in coordinators
(coordinators delete the partitions when they delete the table
instances), they help in avoiding topic entry leak in the statestore
catalog topic.

In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only
sends invalidations on tables and stale partition instances. Each
partition instance is identified by its partition id. LocalCatalog
coordinators use the partition invalidations to evict stale partitions
in time. For instance, let's say partition(year=2010) is updated in
catalogd. This is done by deleting the old partition instance
partition(id=0, year=2010) and adding a new partition instance
partition(id=1, year=2010). Catalogd will send invalidations on the
table and partition instance with id=0, but not the one with id=1. A
LocalCatalog coordinator will invalidate the partition instance(id=0) if
it's in the cache. If the partition instance(id=1) is cached, it's
already the latest version since partition instances are immutable. So
we don't need to invalidate it.

Tests
 - Run exhaustive tests.
 - Run exhaustive test_ddl.py in LocalCatalog mode.
 - Add test in test_local_catalog.py to verify stale partitions are
   invalidated in LocalCatalog when partitions are updated.

Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Reviewed-on: http://gerrit.cloudera.org:8080/16159
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 


> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-server, performance
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2020-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136742#comment-17136742
 ] 

ASF subversion and git services commented on IMPALA-3127:
-

Commit 419aa2e30db326f02e9b4ec563ef7864e82df86e in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=419aa2e ]

IMPALA-9778: Refactor partition modifications in DDL/DMLs

After this patch, in DDL/DMLs that update metadata of partitions,
instead of updating partitions in place, we always create new ones and
use them to replace the existing instances. This is guarded by making
HdfsPartition immutable. There are several benefits for this:
 - HdfsPartition can be shared across table versions. In full catalog
   update mode, catalog update can ignore unchanged partitions
   (IMPALA-3234) and send the update in partition granularity.
 - Aborted DDL/DMLs won't leave partition metadata in a bad shape (e.g.
   IMPALA-8406), which usually requires invalidation to recover.
 - Fetch-on-demand coordinators can cache partition meta using the
   partition id as the key. When table version updates, only metadata of
   changed partitions need to be reloaded (IMPALA-7533).
 - In the work of decoupling partitions from tables (IMPALA-3127), we
   don't need to assign a catalog version to partitions since the
   partition ids already identify the partitions.

However, HdfsPartition is not strictly immutable. Although all its
fields are final, some fields are still referencing mutable objects. We
need more refactoring to achieve this. This patch focuses on refactoring
the DDL/DML code paths.

Changes:
 - Make all fields of HdfsPartition final. Move
   HdfsPartition constructor logics and all its update methods into
   HdfsPartition.Builder.
 - Refactor in-place updates on HdfsPartition to be creating a new one
   and dropping the old one. HdfsPartition.Builder represents the
   in-progress modifications. Once all modifications are done, call its
   build() method to create the new HdfsPartition instance. The old
   HdfsPartition instance is only replaced at the end of the
   modifications.
 - Move the "dirty" marker of HdfsPartition into a map of HdfsTable. It
   maps from the old partition id to the in-progress partition builder.
   For "dirty" partitions, we’ll reload its HMS meta and file meta.

Tests:
 - No new tests are added since the existing tests already provide
   sufficient coverage
 - Run CORE tests

Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c
Reviewed-on: http://gerrit.cloudera.org:8080/15985
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: catalog-server, performance
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3127) Decouple partitions from tables

2020-01-07 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009969#comment-17009969
 ] 

Vihang Karajgaonkar commented on IMPALA-3127:
-

I would like to take this up.

> Decouple partitions from tables
> ---
>
> Key: IMPALA-3127
> URL: https://issues.apache.org/jira/browse/IMPALA-3127
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: catalog-server, performance
>
> Currently, partitions are tightly integrated into the HdfsTable objects, 
> making incremental metadata updates difficult to perform. Furthermore, the 
> catalog transmits entire table metadata even when only few partitions change, 
> introducing significant latencies, wasting network bandwidth and CPU cycles 
> while updating table metadata at the receiving impalads. As a first step, we 
> should decouple partitions from tables and add them as a separate level in 
> the hierarchy of catalog entities (server-db-table-partition). Subsequently, 
> the catalog should transmit only entities that have changed after DDL/DML 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org