Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 )
Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions ...................................................................... Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1125 PS5, Line 1125: collected from a new version > If the partition was readded, shouldn't that operation also remove it from > dropped_partitions? I think you mean 'droppedPartitions' of HdfsTable instead of 'dropped_partitions' of THdfsTable which never changes when it's added to the deleteLog. For 'droppedPartitions' of HdfsTable, we haven't done that yet. Currently, it only adds new items in HdfsTable#dropPartition() https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1146 We can update it in HdfsTable#addPartitionNoThrow() when a partition is re-added. But that only helps when dropping and re-adding a partition on the same HdfsTable object. That comes to the other question. > How could the catalog collect the new version of the partition before > collecting the deletion of the partition? An example is the following sequence: #1 DropPartition addes the partition to 'droppedPartitions' of HdfsTable #2 InvalidateTable replaces the HdfsTable with an IncompleteTable and adds the THdfsTable object into the deleteLog. The 'dropped_partitions' of this THdfsTable object will have a THdfsPartition object representing this partition. https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2363 #3 The table is loaded again so the IncompleteTable is replaced with a new HdfsTable object. #4 AddPartition adds a new HdfsPartition instance (but the same partition name) to the new HdfsTable object. If all these happens in a catalog update cycle, i.e. catalogd collects last round of catalog updates before #1, catalogd will first collect both the table and partition updates at L1013, then collects deletions based on the deleteLog at L1039 and come here. PS5 adds a test case (Test 2) for this: https://gerrit.cloudera.org/c/21326/4..5/tests/custom_cluster/test_partition.py -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Tue, 23 Apr 2024 07:42:00 +0000 Gerrit-HasComments: Yes