[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. IMPALA-13102: Normalize invalid column stats from HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Reviewed-on: http://gerrit.cloudera.org:8080/21445 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M tests/metadata/test_compute_stats.py 5 files changed, 147 insertions(+), 73 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 18:18:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 4: Thanks for the review! Merging this. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 13:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10663/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 13:16:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 13:16:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@474 PS1, Line 474: client.update_table_column_sta > I can also see warning logs like this: thanks for looking into it! -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 12:56:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@474 PS1, Line 474: client.update_table_column_sta > Added an assertion to check the result (bool). I can also see warning logs like this: W0523 08:23:22.166049 24028 ColumnStats.java:549] c543d8cfe8844a49:bad5133f] Invalid numDVs of column name: -400. Normalized to -1. W0523 08:23:22.166126 24028 ColumnStats.java:549] c543d8cfe8844a49:bad5133f] Invalid numNulls of column name: -300. Normalized to -1. W0523 08:23:22.166178 24028 ColumnStats.java:549] c543d8cfe8844a49:bad5133f] Invalid maxSize of column name: -100. Normalized to -1. W0523 08:23:22.166267 24028 ColumnStats.java:557] c543d8cfe8844a49:bad5133f] Invalid avgSize of column name: -200.0. Normalized to -1. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 01:42:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@453 PS1, Line 453: > flake8: W504 line break after binary operator Done http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@454 PS1, Line 454: v > flake8: E131 continuation line unaligned for hanging indent Done http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@474 PS1, Line 474: client.update_table_column_sta > hmm, HMS could also reject these stats Added an assertion to check the result (bool). http://gerrit.cloudera.org:8080/#/c/21445/2/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/2/tests/metadata/test_compute_stats.py@454 PS2, Line 454: v > flake8: E131 continuation line unaligned for hanging indent Done -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 23 May 2024 00:59:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16207/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 May 2024 00:59:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16206/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 May 2024 00:48:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21445 to look at the new patch set (#3). Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. IMPALA-13102: Normalize invalid column stats from HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M tests/metadata/test_compute_stats.py 5 files changed, 147 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21445/3 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/21445/2/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/2/tests/metadata/test_compute_stats.py@454 PS2, Line 454: a flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 23 May 2024 00:25:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats from HMS
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21445 to look at the new patch set (#2). Change subject: IMPALA-13102: Normalize invalid column stats from HMS .. IMPALA-13102: Normalize invalid column stats from HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M tests/metadata/test_compute_stats.py 5 files changed, 147 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21445/2 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@474 PS1, Line 474: update_table_column_statistics hmm, HMS could also reject these stats -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 May 2024 16:24:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16195/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 11:34:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21445 Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. IMPALA-13102: Normalize invalid column stats in HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetColumnStats.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M tests/metadata/test_compute_stats.py 5 files changed, 147 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21445/1 -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-13102: Normalize invalid column stats in HMS
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21445 ) Change subject: IMPALA-13102: Normalize invalid column stats in HMS .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py File tests/metadata/test_compute_stats.py: http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@453 PS1, Line 453: a flake8: W504 line break after binary operator http://gerrit.cloudera.org:8080/#/c/21445/1/tests/metadata/test_compute_stats.py@454 PS1, Line 454: v flake8: E131 continuation line unaligned for hanging indent -- To view, visit http://gerrit.cloudera.org:8080/21445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Gerrit-Change-Number: 21445 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 21 May 2024 11:11:02 + Gerrit-HasComments: Yes