[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 13: (7 comments) Thanks for refactoring the long methods! Just took a first pass. I'll look deeper into this. http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@7 PS14, Line 7: when : EP is turned off nit: let's make this accurate, e.g. "when partition list is stale" http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1292 PS13, Line 1292: Preconditions.checkState( nit: as we touch this code, can we add an error message for Preconditions.checkState() BTW? E.g. Preconditions.checkState( partitionsToUpdate == null || loadPartitionFileMetadata, "Conflicts in 'partitionsToUpdate' and 'loadPartitionFileMetadata'"); http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1238 PS14, Line 1238: metadata should be updated. O nit: we usually put the comment before the value, e.g. L1225-L1227 http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7253 PS13, Line 7253: false nit: could you add a comment to highlight this is 'loadPartitionFileMetadata'? http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7279 PS13, Line 7279: updatePartitionsToCreateAndUnsetStats nit: let's avoid using the variable name 'partitionsToCreate' in the method name. Maybe 'pickupExistingPartitions' is enough. Other details like unsetting COLUMN_STATS_ACCURATE and collecting cacheDirIds can be mentioned in the method comments. http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7405 PS13, Line 7405: nit: remove one space here http://gerrit.cloudera.org:8080/#/c/21437/13/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/13/tests/custom_cluster/test_events_custom_configs.py@1275 PS13, Line 1275: self.client.execute("insert into {}.{} partition(year=2024) values (0)" Can we also add year=2022 as an existing partition? Now we have stale partition (year=2024) dropped externally and missing partition (year=2023) added externally. Just miss the case of inserting into existing partitions. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 14 Jun 2024 08:59:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 14: (4 comments) Will take another look at this, but here is my preliminary comment. http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17 PS12, Line 17: r lagged behind. This patch : address this issue by always loading the verifying the partitions from : HMS without loading the file metadata from HMS regardless > Will change this. Done http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@18 PS14, Line 18: always loading the verifying the partitions nit: "always verify the partitions" ? http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1265 PS14, Line 1265: public void load(boolean reuseMetadata, IMetaStoreClient client, : org.apache.hadoop.hive.metastore.api.Table msTbl, boolean loadPartitionFileMetadata, : boolean loadTableSchema, boolean refreshUpdatedPartitions, : @Nullable Set partitionsToUpdate, @Nullable String debugAction, : @Nullable Map partitionToEventId, String reason, : EventSequence catalogTimeline, boolean isPreLoadForInsert) This method has 12 parameter now. That is quite long. Is it possible to wrap them into one Class? And maybe fluent style Builder for that class. LoadParam param = LoadParamBuilder.reuseMetadata(true).metastoreClient(client). ... .build(); See example at ResourceProfile.java and ResourceProfileBuilder.java. http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270 PS14, Line 1270: isPreLoadForInsert Add documentation for isPreLoadForInsert parameter. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 23:42:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16341/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 23:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#14). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always load the partitions from cached table. This can lead to data inconsistency issue especially in the case of event processor being turned off or lagged behind. This patch address this issue by always loading the verifying the partitions from HMS without loading the file metadata from HMS regardless of state of event processor. This approach will ensure that partition metadata is always consistent with metastore. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 241 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/14 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 14 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16328/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 05:22:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17 PS12, Line 17: r lagged behind. This patch : address this issue by always loading the verifying the partitions from : HMS without loading the file metadata from HMS regardless > Is this still true after patch set 9? Will change this. http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7205 PS12, Line 7205: public TUpdateCatalogResponse updateCatalogImpl(TUpdateCatalogRequest update) > This method is super long now. It'd be nice to refactor it by extracting so Ack http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7262 PS12, Line 7262: // Names of the partitions that are added with add_partitions() RPC. > Instead of using the partition list in the cache, I think a cleaner approac Ack http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7497 PS12, Line 7497: String msg = String.format( > It's not clear to me whether all the stuffs in updateCatalogImpl() are retr That's the downside of this approach. This is now irrelevant given that I have implemented your suggestion in the current patchset. http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py@1279 PS12, Line 1279: self.run_stmt_in_hive("alter table {}.{} add partition(year=2023)" > Can we add more partitions to cover existing partitions? E.g. The table can Ack -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 13 Jun 2024 04:57:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#13). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always load the partitions from cached table. This can lead to data inconsistency issue especially in the case of event processor being turned off or lagged behind. This patch address this issue by always loading the verifying the partitions from HMS without loading the file metadata from HMS regardless of state of event processor. This approach will ensure that partition metadata is always consistent with metastore. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 241 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/13 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 13 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7205 PS12, Line 7205: public TUpdateCatalogResponse updateCatalogImpl(TUpdateCatalogRequest update, This method is super long now. It'd be nice to refactor it by extracting some codes into methods. http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7262 PS12, Line 7262: FeCatalogUtils.loadAllPartitions((FeFsTable)table); Instead of using the partition list in the cache, I think a cleaner approach is always fetching the updated partition list from HMS (without loading the file metadata). So we can also take care of new partitions added externally. The reason is that we will always fetch the updated partition list later in loadTableMetadata(). We can do it here to get rid of stale metadata cache. http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7497 PS12, Line 7497: return updateCatalogImpl(update, false); // we should only retry once. It's not clear to me whether all the stuffs in updateCatalogImpl() are retryable, i.e. idempotent. E.g. do we fire duplicate HMS events? http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py@1279 PS12, Line 1279: self.client.execute("insert into {}.{} partition(year=2024) values (0)" Can we add more partitions to cover existing partitions? E.g. The table can originally have several partitions. One is dropped externally. Another new one (with a new name) is added externally. Then an INSERT in Impala inserts to all these partitions. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 11 Jun 2024 03:04:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17 PS12, Line 17: This patch address this issue by : reusing metadata only when event processor state is active. If it is : not, we should always fetch the latest metadata from HMS. Is this still true after patch set 9? http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480 PS10, Line 7480: etry insert again if there is table loading exception, since cache is updated : // using metastore api call > Yeah, I intend to do it but somehow forgot it. Thanks for pointing out!. Done http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482 PS10, Line 7482: TableLoadFail > This is not required anymore. I planned to do an RPC call to HMS previously Done -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 07 Jun 2024 22:04:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16302/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 07 Jun 2024 20:41:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#12). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 52 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/12 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 12: (5 comments) http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7473 PS10, Line 7473: > nit: loading Ack http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480 PS10, Line 7480: etry insert again if there is table loading exception, since cache is updated : // using metastore api call > Should this call abortTransaction first for the failed update? Yeah, I intend to do it but somehow forgot it. Thanks for pointing out!. http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482 PS10, Line 7482: TableLoadFail > Which operation can throw this exception? This is not required anymore. I planned to do an RPC call to HMS previously but we can get around with it. http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7485 PS10, Line 7485: UnlockWriteLockIfErronouslyLocked(); > LOG the retry attempt before recursing? Ack. I have also logged the error at the beginning for table loading exception. http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py@1266 PS10, Line 1266: def test_ep_delay_metadata_reload_for_insert(self, unique_databas > Add comment about what this test do. Ack -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 12 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Fri, 07 Jun 2024 20:17:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16286/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 11 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 06 Jun 2024 22:18:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7473 PS10, Line 7473: lodaing nit: loading http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480 PS10, Line 7480: update.setTransaction_id(MetastoreShim.openTransaction(msClient : .getHiveClient())); Should this call abortTransaction first for the failed update? http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482 PS10, Line 7482: TException e) Which operation can throw this exception? http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7485 PS10, Line 7485: return updateCatalogImpl(update, false); // we should only retry once. LOG the retry attempt before recursing? http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py@1266 PS10, Line 1266: def test_no_ep_metadata_reload_for_insert(self, unique_database): Add comment about what this test do. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 06 Jun 2024 22:01:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#11). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 43 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/11 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 11 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 10: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/16284/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 06 Jun 2024 21:25:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#10). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 43 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/10 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 10 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508 PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus() > IEventProcessorStatus.DISABLED is configured when NoOpEventProcessor is req Agree. But what I meant is, MetastoreEventProcessor itself can never transition to EventProcessorStatus.DISABLED. If that is the case, a Precondition.checkState(eventProcessorStatus_ != EventProcessorStatus.DISABLED) can be added inside MetastoreEventProcessor.updateStatus() or MetastoreEventProcessor.getStatus(). $ git grep -n -e "updateStatus" -e "eventProcessorStatus_ =" fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:610: private EventProcessorStatus eventProcessorStatus_ = EventProcessorStatus.STOPPED; fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:770: updateStatus(EventProcessorStatus.ACTIVE); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:837: if (eventProcessorStatus_ == EventProcessorStatus.PAUSED) { fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:840: updateStatus(EventProcessorStatus.PAUSED); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:929: updateStatus(EventProcessorStatus.ACTIVE); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:948: updateStatus(EventProcessorStatus.STOPPED); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1085: updateStatus(EventProcessorStatus.NEEDS_INVALIDATE); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1095: updateStatus(EventProcessorStatus.ERROR); fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1414: public synchronized void updateStatus(EventProcessorStatus toStatus) { fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1415: eventProcessorStatus_ = toStatus; fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:3897: eventsProcessor_.updateStatus(EventProcessorStatus.NEEDS_INVALIDATE); fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:3902: eventsProcessor_.updateStatus(EventProcessorStatus.ERROR); -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 20:31:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS8, Line 2465: boolean isEpActiveOrDisabled = isEventProcessin > I'm still not sure about this. What if I have a cluster with only Impala as Ack. Unfortunately, that is the downside to ensuring data consistency but we can get around it by turning on the event processor even though it is a single Impala cluster. http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469 PS8, Line 2469: metastoreEventProcessor_ instanceo > I think it is better to be verbose here by printing the actual EventProcess Done http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508 PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus() > Just question because I'm a little bit confused. IEventProcessorStatus.DISABLED is configured when NoOpEventProcessor is required (For eg: MetastoreEventsProcessorTest like here: https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L1825) https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java#L52 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 20:08:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:56:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS8, Line 2465: boolean isEpActiveOrDisabled = isEventProcessin > I'm still not sure about this. What if I have a cluster with only Impala as Thanks. I just realize isEventProcessingActive already check for that (metastoreEventProcessor_ instanceof MetastoreEventsProcessor) Done. http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469 PS8, Line 2469: metastoreEventProcessor_ instanceo > I think it is better to be verbose here by printing the actual EventProcess Done http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508 PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus() Just question because I'm a little bit confused. Can this ever return EventProcessorStatus.DISABLED? I think it is impossible, but please correct me if I'm wrong. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:56:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16199/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Wed, 22 May 2024 00:07:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#9). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 34 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/9 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 9 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: f no validWriteIdList is > L2471 and L2483 are two if conditions and I don't see event processor switc Done http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS8, Line 2465: boolean isEpActive = isEventProcessingActive(); I'm still not sure about this. What if I have a cluster with only Impala as query engine (no Hive, no Spark), and I turn off Event Processor entirely because I don't need it (metastoreEventProcessor_ is a NoOpEventProcessor)? Will Catalogd forced to reload table all the time? I think it should be: boolean isEpActiveOrDisabled = ... http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469 PS8, Line 2469: isEpActive ? "ACTIVE" : "INACTIVE" I think it is better to be verbose here by printing the actual EventProcessorStatus enum or "NONE" if metastoreEventProcessor_ is not a MetastoreEventsProcessor. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 18:10:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16198/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:48:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16197/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:47:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16196/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:46:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#8). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 32 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/8 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 8 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267 PS5, Line 1267: > Turn this into a parameter of function verify_partition below. Ack -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS5, Line 2465: boolean isEpActive = isEventProcessingActive(); : if (LOG.isTraceEnabled()) { > Add trace LOG about EP status here, whether tbl is loaded, and whether tbl Ack http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: f no validWriteIdList is > What happen if isEventProcessingActive() changed between L2471 and L2483? L2471 and L2483 are two if conditions and I don't see event processor switching state between them. But from a code readability perspective, it makes sense to store it in a boolean variable as we need in log.trace() statement above. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#6). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 33 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/6 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#7). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 32 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/7 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268 PS7, Line 1268: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268 PS7, Line 1268: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 7 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:25:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269 PS6, Line 1269: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269 PS6, Line 1269: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 6 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException > Ack. The issue happens with transactional tables as well. Done http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465 PS5, Line 2465: LOG.trace("table {} exits in cache, last synced id {}", tbl.getFullName(), : tbl.getLastSyncedEventId()); Add trace LOG about EP status here, whether tbl is loaded, and whether tbl is transactional or not. Also wrap it with if (LOG.isTraceEnabled()) { ... } http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471 PS5, Line 2471: isEventProcessingActive() What happen if isEventProcessingActive() changed between L2471 and L2483? Should it be called once and stored to boolean variable instead? http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266 PS4, Line 1266: test_no_ep_metadata_reload_for_insert > Good catch!! The issue happens with transactional tables as well. Done http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267 PS5, Line 1267: test_table Turn this into a parameter of function verify_partition below. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 16:07:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16194/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 00:35:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1268 PS5, Line 1268: d flake8: E306 expected 1 blank line before a nested definition, found 0 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 00:11:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#5). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. The issue can be seen with the following steps: - Turn off the event processor - create a partitioned table and add a partition from impala - drop the same partition from hive - from impala, insert values into the partition (expectation is that if the partition didn't exist, it will create a new one). Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario for external and manged tables. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 26 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/5 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException > Thanks for your explanation. Ack. The issue happens with transactional tables as well. http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2472 PS4, Line 2472: !AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters( && : isEventProcessingActive() > nit: I think checking isEventProcessingActive() first is better here. Ack http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266 PS4, Line 1266: test_no_ep_metadata_reload_for_insert > Can you test the same scenario over transactional table? Or is there an exi Good catch!! The issue happens with transactional tables as well. Addressed it and modified the test to cover the transactional table scenario as well. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 21 May 2024 00:10:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException > Jira has more details about this. But to explain this in simple terms, with Thanks for your explanation. Please describe those scenario in this commit message as short bullet points. I think the transactional property should also be highlighted? http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2472 PS4, Line 2472: !AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters( && : isEventProcessingActive() nit: I think checking isEventProcessingActive() first is better here. http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748 PS1, Line 1748: > Discard this change. This is not completely addressing the concern. Done http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751 PS1, Line 1751:debugAction, partitionToEventId, reason, catalogTimeline); > Same as above Done http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/util/AcidUtils.java@679 PS4, Line 679: If the tbl metadata is a :* superset of the metadata view represented by the given validWriteIdList this :* method returns a value greater than 0. If they are an exact match of each other, :* it returns 0 and if the table ValidWriteIdList is behind the provided :* validWriteIdList this return -1. nit: update this comment with the new transactional property check. http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266 PS4, Line 1266: test_no_ep_metadata_reload_for_insert Can you test the same scenario over transactional table? Or is there an existing test that already does that? -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 20 May 2024 18:38:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16191/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 20 May 2024 18:19:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16190/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 20 May 2024 18:18:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16189/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 20 May 2024 18:00:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException > What is the source of NPE? Where does it hit / manifest problem? Jira has more details about this. But to explain this in simple terms, with the event processor turned off, create a partitioned table from Impala and add partition (p=1) by inserting some values, at this point drop the partition (p=1) from the hive and insert some values into a partition(p=1), then we would hit NULL pointer exception because, we are reusing metadata while reloading the table, catalogd first removes it since it doesn't exist in HMS (fetch partitions from HMS) and then it validates each partition in the cache against the partitions from HMS, we would then hit precondition check for non-null partition names here: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1777 http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748 PS1, Line 1748: > Do you mean "not active or not healthy." ? Discard this change. This is not completely addressing the concern. I have uploaded a new patchset. http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751 PS1, Line 1751:debugAction, partitionToEventId, reason, catalogTimeline); > This line remains unchanged for a long time since Fri Dec 18 14:31:24 (IMPA Same as above -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 20 May 2024 17:53:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#4). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 3 files changed, 21 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/4 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#3). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 4 files changed, 23 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/3 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 3 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Hello Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21437 to look at the new patch set (#2). Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/AcidUtils.java M tests/custom_cluster/test_events_custom_configs.py 4 files changed, 26 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/2 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 2 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/16184/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 May 2024 20:01:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21437 ) Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11 PS1, Line 11: NullPointerException What is the source of NPE? Where does it hit / manifest problem? http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748 PS1, Line 1748: not active Do you mean "not active or not healthy." ? // possible status of event processor public enum EventProcessorStatus { PAUSED, // event processor is paused because catalog is being reset concurrently ACTIVE, // event processor is scheduled at a given frequency ERROR, // event processor is in error state and event processing has stopped NEEDS_INVALIDATE, // event processor could not resolve certain events and needs a // manual invalidate command to reset the state (See AlterEvent for a example) STOPPED, // event processor is shutdown. No events will be processed DISABLED // event processor is not configured to run } http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751 PS1, Line 1751: (HdfsTable) tbl).load(reuseMetadata, msClient.getHiveClient(), msTbl, This line remains unchanged for a long time since Fri Dec 18 14:31:24 (IMPALA-1480: Slow DDL statements with large number of partitions), presumably even before Impala have HMS Event Processor. What is recently change that makes this cause a problem? Is metadata reuse also not allowed if EventProcessorStatus.DISABLED? -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 17 May 2024 19:24:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off
Sai Hemanth Gantasala has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21437 Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off .. IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off When event processor is turned off, inserting values into partitioned table can lead to NullPointerException if the partition is deleted outside impala (eg: HMS). Since event processor is turned off, impala is unaware of the metadata changes to the table. Currently in impala, we always reuse the metadata when reloading a table. This can lead to data inconsistency issue especially in the case of event processor being turned off. This patch address this issue by reusing metadata only when event processor state is active. If it is not, we should always fetch the latest metadata from HMS. Testing: - Verified manually that NullPointerException is avoided with this patch - Added end-to-end tests to verify the above scenario. Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_events_custom_configs.py 2 files changed, 18 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/1 -- To view, visit http://gerrit.cloudera.org:8080/21437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18 Gerrit-Change-Number: 21437 Gerrit-PatchSet: 1 Gerrit-Owner: Sai Hemanth Gantasala