[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 07 May 2024 01:56:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Reviewed-on: http://gerrit.cloudera.org:8080/21326 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 262 insertions(+), 54 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 7: > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10606/ The failure is due to IMPALA-9441. Retrigger the job. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 20:43:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10610/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 20:44:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 20:44:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10606/ -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 15:50:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 10:41:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10606/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 10:43:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 06 May 2024 10:43:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 6: Code-Review+1 (1 comment) LGTM http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG@14 PS5, Line 14: When catalogd collects the next : catalog update, they will be collected. HdfsTable then clears the set. > These are about CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). Ad Ack -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 29 Apr 2024 22:39:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16045/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Sun, 28 Apr 2024 07:47:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG@14 PS5, Line 14: When catalogd collects the next : catalog update, they will be collected. HdfsTable then clears the set. > nit: can you make it more clear? These are about CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). Added in the commit message. Is it better now? http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@30 PS5, Line 30: > nit: unused import Done -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Sun, 28 Apr 2024 07:24:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#6). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. See details in CatalogServiceCatalog#addHdfsPartitionsToCatalogDelta(). If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 262 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/6 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21326/5//COMMIT_MSG@14 PS5, Line 14: When catalogd collects the next : catalog update, they will be collected. HdfsTable then clears the set. nit: can you make it more clear? http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@30 PS5, Line 30: import java.util.stream.Collectors; nit: unused import -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 26 Apr 2024 21:26:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1125 PS5, Line 1125: collected from a new version > > If the partition was readded, shouldn't that operation also remove it fro Thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 26 Apr 2024 05:57:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1125 PS5, Line 1125: collected from a new version > If the partition was readded, shouldn't that operation also remove it from > dropped_partitions? I think you mean 'droppedPartitions' of HdfsTable instead of 'dropped_partitions' of THdfsTable which never changes when it's added to the deleteLog. For 'droppedPartitions' of HdfsTable, we haven't done that yet. Currently, it only adds new items in HdfsTable#dropPartition() https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1146 We can update it in HdfsTable#addPartitionNoThrow() when a partition is re-added. But that only helps when dropping and re-adding a partition on the same HdfsTable object. That comes to the other question. > How could the catalog collect the new version of the partition before > collecting the deletion of the partition? An example is the following sequence: #1 DropPartition addes the partition to 'droppedPartitions' of HdfsTable #2 InvalidateTable replaces the HdfsTable with an IncompleteTable and adds the THdfsTable object into the deleteLog. The 'dropped_partitions' of this THdfsTable object will have a THdfsPartition object representing this partition. https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2363 #3 The table is loaded again so the IncompleteTable is replaced with a new HdfsTable object. #4 AddPartition adds a new HdfsPartition instance (but the same partition name) to the new HdfsTable object. If all these happens in a catalog update cycle, i.e. catalogd collects last round of catalog updates before #1, catalogd will first collect both the table and partition updates at L1013, then collects deletions based on the deleteLog at L1039 and come here. PS5 adds a test case (Test 2) for this: https://gerrit.cloudera.org/c/21326/4..5/tests/custom_cluster/test_partition.py -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 23 Apr 2024 07:42:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1125 PS5, Line 1125: collected from a new version How could the catalog collect the new version of the partition before collecting the deletion of the partition? If the partition was readded, shouldn't that operation also remove it from dropped_partitions? -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 23 Apr 2024 06:14:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 22 Apr 2024 06:51:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10571/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 22 Apr 2024 01:50:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15969/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 20 Apr 2024 08:23:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#5). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 263 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/5 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 20 Apr 2024 04:00:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10564/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 19 Apr 2024 22:57:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15958/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 19 Apr 2024 12:17:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1048 PS3, Line 1048: TCatalogObject catalog = Extracted this part into a method. http://gerrit.cloudera.org:8080/#/c/21326/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1082 PS3, Line 1082: return; We should only add the deletion if it's not in the updates (same as L1065), i.e. if there is a new version of the partition collected, don't add the deletion. When a partition is updated, the old version is added to the droppedPartitions set, while the new version is in partitionsMap. When the table is invalidated and then reloaded, the old HdfsTable instance will be added to the deleteLog. If these all happen in a catalog update cycle, catalogd will collect updates of the partition. We should ignore its old version which is inside the droppedPartitions of the old HdfsTable instance. This causes tests like metadata/test_recover_partitions.py::TestRecoverPartitions::test_post_invalidate hanging forever. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 19 Apr 2024 11:53:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#4). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 219 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/4 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/ -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 23:27:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10557/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 13:26:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15939/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 11:13:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#3). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 151 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/3 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: (1 comment) Thanks for the quick review! http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075 PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) continue; > What does this case means? The partition was dropped, but was readded later Yeah, if a partition is dropped and then re-added, the droppedPartitions will have the old instance and the partitionMap will have the new instance. When the table is dropped/invalidated, partitions from the partitionMap are collected in the for-loop at L1057. Some of them could have the same partition name as those in the dropped_partitions. Renamed 'addedPartNames' to 'collectedPartNames' to avoid confusion. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 10:49:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21326/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1075 PS2, Line 1075: if (addedPartNames.contains(part.partition_name)) continue; What does this case means? The partition was dropped, but was readded later? -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 10:17:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15938/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:52:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py File tests/custom_cluster/test_partition.py: http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93 PS1, Line 93: T > flake8: F821 undefined name 'TestPartitionMetadata' Done http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98 PS1, Line 98: > flake8: W504 line break after binary operator Done -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:48:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Hello Fang-Yu Rao, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21326 to look at the new patch set (#2). Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 148 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/2 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/15937/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 18 Apr 2024 09:25:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21326 Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions *Background* Since IMPALA-3127, catalogd sends incremental partition updates based on the last sent table snapshot ('maxSentPartitionId_' to be specific). Dropped partitions since the last catalog update are tracked in 'droppedPartitions_' of HdfsTable. When catalogd collects the next catalog update, they will be collected. HdfsTable then clears the set. If an HdfsTable is invalidated, it's replaced with an IncompleteTable which doesn't track any partitions. The HdfsTable object is then added to the deleteLog so catalogd can send deletion updates for all its partitions. The same if the HdfsTable is dropped. However, the previously dropped partitions are not collected in this case, which results in a leak in the catalog topic if the partition name is not reused anymore. Note that in the catalog topic, the key of a partition update consists of the table name and the partition name. So if the partition is added back to the table, the topic key will be reused then resolves the leak. The leak will be observed when a coordinator restarts. In the initial catalog update sent from statestore, coordinator will find some partition updates that are not referenced by the HdfsTable (assuming the table is used again after the INVALIDATE). Then a Precondition check fails and the table is not added to the coordinator. *Overview of the patch* This patch fixes the leak by also collecting the dropped partitions when adding the HdfsTable to the deleteLog. A new field, dropped_partitions, is added in THdfsTable to collect them. It's only used when catalogd collects catalog updates. Removes the Precondition check in coordinator and just reports the stale partitions since IMPALA-12831 could also introduce them. Also adds a log line in CatalogOpExecutor.alterTableDropPartition() to show the dropped partition names for better diagnostics. Tests - Added e2e tests Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_partition.py M tests/metadata/test_recover_partitions.py 8 files changed, 148 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/21326/1 -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21326 ) Change subject: IMPALA-13009: Fix catalogd not sending deletion updates for some dropped partitions .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py File tests/custom_cluster/test_partition.py: http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@93 PS1, Line 93: T flake8: F821 undefined name 'TestPartitionMetadata' http://gerrit.cloudera.org:8080/#/c/21326/1/tests/custom_cluster/test_partition.py@98 PS1, Line 98: a flake8: W504 line break after binary operator -- To view, visit http://gerrit.cloudera.org:8080/21326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12a68158dca18ee48c9564ea16b7484c9f5b5d21 Gerrit-Change-Number: 21326 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 18 Apr 2024 09:01:29 + Gerrit-HasComments: Yes