Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21326 )

Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for 
some dropped partitions
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1125
PS5, Line 1125: collected from a new version
> If the partition was readded, shouldn't that operation also remove it from 
> dropped_partitions?

I think you mean 'droppedPartitions' of HdfsTable instead of 
'dropped_partitions' of THdfsTable which never changes when it's added to the 
deleteLog. For 'droppedPartitions' of HdfsTable, we haven't done that yet. 
Currently, it only adds new items in HdfsTable#dropPartition()
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1146
We can update it in HdfsTable#addPartitionNoThrow() when a partition is 
re-added. But that only helps when dropping and re-adding a partition on the 
same HdfsTable object. That comes to the other question.

> How could the catalog collect the new version of the partition before 
> collecting the deletion of the partition?

An example is the following sequence:
#1 DropPartition addes the partition to 'droppedPartitions' of HdfsTable
#2 InvalidateTable replaces the HdfsTable with an IncompleteTable and adds the 
THdfsTable object into the deleteLog. The 'dropped_partitions' of this 
THdfsTable object will have a THdfsPartition object representing this partition.
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2363
#3 The table is loaded again so the IncompleteTable is replaced with a new 
HdfsTable object.
#4 AddPartition adds a new HdfsPartition instance (but the same partition name) 
to the new HdfsTable object.

If all these happens in a catalog update cycle, i.e. catalogd collects last 
round of catalog updates before #1, catalogd will first collect both the table 
and partition updates at L1013, then collects deletions based on the deleteLog 
at L1039 and come here.

PS5 adds a test case (Test 2) for this: 
https://gerrit.cloudera.org/c/21326/4..5/tests/custom_cluster/test_partition.py



--
To view, visit http://gerrit.cloudera.org:8080/21326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21
Gerrit-Change-Number: 21326
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Apr 2024 07:42:00 +0000
Gerrit-HasComments: Yes

Reply via email to