[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 13:

(7 comments)

Thanks for refactoring the long methods! Just took a first pass. I'll look 
deeper into this.

http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@7
PS14, Line 7: when
: EP is turned off
nit: let's make this accurate, e.g. "when partition list is stale"


http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1292
PS13, Line 1292: Preconditions.checkState(
nit: as we touch this code, can we add an error message for 
Preconditions.checkState() BTW? E.g.

Preconditions.checkState(
partitionsToUpdate == null || loadPartitionFileMetadata,
"Conflicts in 'partitionsToUpdate' and 
'loadPartitionFileMetadata'");


http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1238
PS14, Line 1238:  metadata should be updated. O
nit: we usually put the comment before the value, e.g. L1225-L1227


http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7253
PS13, Line 7253: false
nit: could you add a comment to highlight this is 'loadPartitionFileMetadata'?


http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7279
PS13, Line 7279: updatePartitionsToCreateAndUnsetStats
nit: let's avoid using the variable name 'partitionsToCreate' in the method 
name. Maybe 'pickupExistingPartitions' is enough. Other details like unsetting 
COLUMN_STATS_ACCURATE and collecting cacheDirIds can be mentioned in the method 
comments.


http://gerrit.cloudera.org:8080/#/c/21437/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7405
PS13, Line 7405:
nit: remove one space here


http://gerrit.cloudera.org:8080/#/c/21437/13/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/13/tests/custom_cluster/test_events_custom_configs.py@1275
PS13, Line 1275:   self.client.execute("insert into {}.{} 
partition(year=2024) values (0)"
Can we also add year=2022 as an existing partition? Now we have stale partition 
(year=2024) dropped externally and missing partition (year=2023) added 
externally. Just miss the case of inserting into existing partitions.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 14 Jun 2024 08:59:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 14:

(4 comments)

Will take another look at this, but here is my preliminary comment.

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17
PS12, Line 17: r lagged behind. This patch
 : address this issue by always loading the verifying the 
partitions from
 : HMS without loading the file metadata from HMS regardless
> Will change this.
Done


http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@18
PS14, Line 18: always loading the verifying the partitions
nit: "always verify the partitions" ?


http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1265
PS14, Line 1265: public void load(boolean reuseMetadata, IMetaStoreClient 
client,
   :   org.apache.hadoop.hive.metastore.api.Table msTbl, 
boolean loadPartitionFileMetadata,
   :   boolean loadTableSchema, boolean 
refreshUpdatedPartitions,
   :   @Nullable Set partitionsToUpdate, @Nullable 
String debugAction,
   :   @Nullable Map partitionToEventId, String 
reason,
   :   EventSequence catalogTimeline, boolean 
isPreLoadForInsert)
This method has 12 parameter now. That is quite long.
Is it possible to wrap them into one Class? And maybe fluent style Builder for 
that class.

LoadParam param = LoadParamBuilder.reuseMetadata(true).metastoreClient(client). 
... .build();

See example at ResourceProfile.java and ResourceProfileBuilder.java.


http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270
PS14, Line 1270: isPreLoadForInsert
Add documentation for isPreLoadForInsert parameter.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:42:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16341/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#14).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always load the partitions from cached table.
This can lead to data inconsistency issue especially in the case
of event processor being turned off or lagged behind. This patch
address this issue by always loading the verifying the partitions from
HMS without loading the file metadata from HMS regardless of state of
event processor. This approach will ensure that partition metadata is
always consistent with metastore.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 241 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/14
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16328/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 05:22:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-12 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17
PS12, Line 17: r lagged behind. This patch
 : address this issue by always loading the verifying the 
partitions from
 : HMS without loading the file metadata from HMS regardless
> Is this still true after patch set 9?
Will change this.


http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7205
PS12, Line 7205:   public TUpdateCatalogResponse 
updateCatalogImpl(TUpdateCatalogRequest update)
> This method is super long now. It'd be nice to refactor it by extracting so
Ack


http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7262
PS12, Line 7262:   // Names of the partitions that are added with 
add_partitions() RPC.
> Instead of using the partition list in the cache, I think a cleaner approac
Ack


http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7497
PS12, Line 7497:   String msg = String.format(
> It's not clear to me whether all the stuffs in updateCatalogImpl() are retr
That's the downside of this approach.
This is now irrelevant given that I have implemented your suggestion in the 
current patchset.


http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py@1279
PS12, Line 1279:   self.run_stmt_in_hive("alter table {}.{} add 
partition(year=2023)"
> Can we add more partitions to cover existing partitions? E.g. The table can
Ack



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 04:57:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-12 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#13).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always load the partitions from cached table.
This can lead to data inconsistency issue especially in the case
of event processor being turned off or lagged behind. This patch
address this issue by always loading the verifying the partitions from
HMS without loading the file metadata from HMS regardless of state of
event processor. This approach will ensure that partition metadata is
always consistent with metastore.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 241 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/13
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 13
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7205
PS12, Line 7205:   public TUpdateCatalogResponse 
updateCatalogImpl(TUpdateCatalogRequest update,
This method is super long now. It'd be nice to refactor it by extracting some 
codes into methods.


http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7262
PS12, Line 7262:   FeCatalogUtils.loadAllPartitions((FeFsTable)table);
Instead of using the partition list in the cache, I think a cleaner approach is 
always fetching the updated partition list from HMS (without loading the file 
metadata). So we can also take care of new partitions added externally.

The reason is that we will always fetch the updated partition list later in 
loadTableMetadata(). We can do it here to get rid of stale metadata cache.


http://gerrit.cloudera.org:8080/#/c/21437/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7497
PS12, Line 7497: return updateCatalogImpl(update, false); // we should 
only retry once.
It's not clear to me whether all the stuffs in updateCatalogImpl() are 
retryable, i.e. idempotent. E.g. do we fire duplicate HMS events?


http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/12/tests/custom_cluster/test_events_custom_configs.py@1279
PS12, Line 1279:   self.client.execute("insert into {}.{} 
partition(year=2024) values (0)"
Can we add more partitions to cover existing partitions? E.g. The table can 
originally have several partitions. One is dropped externally. Another new one 
(with a new name) is added externally. Then an INSERT in Impala inserts to all 
these partitions.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 11 Jun 2024 03:04:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-07 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17
PS12, Line 17: This patch address this issue by
 : reusing metadata only when event processor state is active. If 
it is
 : not, we should always fetch the latest metadata from HMS.
Is this still true after patch set 9?


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480
PS10, Line 7480: etry insert again if there is table loading exception, since 
cache is updated
   :   // using metastore api call
> Yeah, I intend to do it but somehow forgot it. Thanks for pointing out!.
Done


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482
PS10, Line 7482: TableLoadFail
> This is not required anymore. I planned to do an RPC call to HMS previously
Done



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 07 Jun 2024 22:04:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16302/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 07 Jun 2024 20:41:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-07 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#12).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 52 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/12
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-07 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7473
PS10, Line 7473:
> nit: loading
Ack


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480
PS10, Line 7480: etry insert again if there is table loading exception, since 
cache is updated
   :   // using metastore api call
> Should this call abortTransaction first for the failed update?
Yeah, I intend to do it but somehow forgot it. Thanks for pointing out!.


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482
PS10, Line 7482: TableLoadFail
> Which operation can throw this exception?
This is not required anymore. I planned to do an RPC call to HMS previously but 
we can get around with it.


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7485
PS10, Line 7485:   UnlockWriteLockIfErronouslyLocked();
> LOG the retry attempt before recursing?
Ack. I have also logged the error at the beginning for table loading exception.


http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py@1266
PS10, Line 1266: def test_ep_delay_metadata_reload_for_insert(self, 
unique_databas
> Add comment about what this test do.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 12
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 07 Jun 2024 20:17:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16286/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 11
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 06 Jun 2024 22:18:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-06 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7473
PS10, Line 7473: lodaing
nit: loading


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7480
PS10, Line 7480: update.setTransaction_id(MetastoreShim.openTransaction(msClient
   :   .getHiveClient()));
Should this call abortTransaction first for the failed update?


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7482
PS10, Line 7482: TException e)
Which operation can throw this exception?


http://gerrit.cloudera.org:8080/#/c/21437/10/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7485
PS10, Line 7485: return updateCatalogImpl(update, false); // we should only 
retry once.
LOG the retry attempt before recursing?


http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/10/tests/custom_cluster/test_events_custom_configs.py@1266
PS10, Line 1266: def test_no_ep_metadata_reload_for_insert(self, 
unique_database):
Add comment about what this test do.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 06 Jun 2024 22:01:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-06 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#11).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 43 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/11
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 11
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 10:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/16284/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 06 Jun 2024 21:25:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-06 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#10).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 43 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/10
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508
PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus()
> IEventProcessorStatus.DISABLED is configured when NoOpEventProcessor is req
Agree.
But what I meant is, MetastoreEventProcessor itself can never transition to 
EventProcessorStatus.DISABLED.
If that is the case, a Precondition.checkState(eventProcessorStatus_ != 
EventProcessorStatus.DISABLED) can be added inside 
MetastoreEventProcessor.updateStatus() or MetastoreEventProcessor.getStatus().

$ git grep -n -e "updateStatus" -e "eventProcessorStatus_ ="
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:610:
  private EventProcessorStatus eventProcessorStatus_ = 
EventProcessorStatus.STOPPED;
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:770:
updateStatus(EventProcessorStatus.ACTIVE);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:837:
if (eventProcessorStatus_ == EventProcessorStatus.PAUSED) {
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:840:
updateStatus(EventProcessorStatus.PAUSED);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:929:
updateStatus(EventProcessorStatus.ACTIVE);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:948:
updateStatus(EventProcessorStatus.STOPPED);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1085:
  updateStatus(EventProcessorStatus.NEEDS_INVALIDATE);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1095:
  updateStatus(EventProcessorStatus.ERROR);
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1414:
  public synchronized void updateStatus(EventProcessorStatus toStatus) {
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:1415:
eventProcessorStatus_ = toStatus;
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:3897:
  eventsProcessor_.updateStatus(EventProcessorStatus.NEEDS_INVALIDATE);
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:3902:
  eventsProcessor_.updateStatus(EventProcessorStatus.ERROR);



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 22 May 2024 20:31:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-22 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465
PS8, Line 2465: boolean isEpActiveOrDisabled = isEventProcessin
> I'm still not sure about this. What if I have a cluster with only Impala as
Ack.

Unfortunately, that is the downside to ensuring data consistency but we can get 
around it by turning on the event processor even though it is a single Impala 
cluster.


http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469
PS8, Line 2469: metastoreEventProcessor_ instanceo
> I think it is better to be verbose here by printing the actual EventProcess
Done


http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508
PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus()
> Just question because I'm a little bit confused.
IEventProcessorStatus.DISABLED is configured when NoOpEventProcessor is 
required (For eg: MetastoreEventsProcessorTest like here: 
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L1825)
 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java#L52



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 22 May 2024 20:08:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 22 May 2024 00:56:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465
PS8, Line 2465: boolean isEpActiveOrDisabled = isEventProcessin
> I'm still not sure about this. What if I have a cluster with only Impala as
Thanks. I just realize isEventProcessingActive already check for that 
(metastoreEventProcessor_ instanceof MetastoreEventsProcessor)
Done.


http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469
PS8, Line 2469: metastoreEventProcessor_ instanceo
> I think it is better to be verbose here by printing the actual EventProcess
Done


http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@508
PS9, Line 508: ((MetastoreEventsProcessor) metastoreEventProcessor_).getStatus()
Just question because I'm a little bit confused.
Can this ever return EventProcessorStatus.DISABLED?
I think it is impossible, but please correct me if I'm wrong.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 22 May 2024 00:56:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16199/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Wed, 22 May 2024 00:07:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#9).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 34 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/9
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471
PS5, Line 2471: f no validWriteIdList is
> L2471 and L2483 are two if conditions and I don't see event processor switc
Done


http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465
PS8, Line 2465: boolean isEpActive = isEventProcessingActive();
I'm still not sure about this. What if I have a cluster with only Impala as 
query engine (no Hive, no Spark), and I turn off Event Processor entirely 
because I don't need it (metastoreEventProcessor_ is a NoOpEventProcessor)?
Will Catalogd forced to reload table all the time?

I think it should be:
boolean isEpActiveOrDisabled = ...


http://gerrit.cloudera.org:8080/#/c/21437/8/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2469
PS8, Line 2469: isEpActive ? "ACTIVE" : "INACTIVE"
I think it is better to be verbose here by printing the actual 
EventProcessorStatus enum or "NONE" if metastoreEventProcessor_ is not a 
MetastoreEventsProcessor.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 18:10:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16198/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:48:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16197/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:47:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16196/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:46:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#8).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 32 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/8
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267
PS5, Line 1267:
> Turn this into a parameter of function verify_partition below.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465
PS5, Line 2465: boolean isEpActive = isEventProcessingActive();
  :   if (LOG.isTraceEnabled()) {
> Add trace LOG about EP status here, whether tbl is loaded, and whether tbl
Ack


http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471
PS5, Line 2471: f no validWriteIdList is
> What happen if isEventProcessingActive() changed between L2471 and L2483?
L2471 and L2483 are two if conditions and I don't see event processor switching 
state between them.
But from a code readability perspective, it makes sense to store it in a 
boolean variable as we need in log.trace() statement above.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:24:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#6).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 33 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/6
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#7).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 32 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/7
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268
PS7, Line 1268:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/21437/7/tests/custom_cluster/test_events_custom_configs.py@1268
PS7, Line 1268:
flake8: E251 unexpected spaces around keyword / parameter equals



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:25:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269
PS6, Line 1269:
flake8: E251 unexpected spaces around keyword / parameter equals


http://gerrit.cloudera.org:8080/#/c/21437/6/tests/custom_cluster/test_events_custom_configs.py@1269
PS6, Line 1269:
flake8: E251 unexpected spaces around keyword / parameter equals



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 17:24:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11
PS1, Line 11: NullPointerException
> Ack. The issue happens with transactional tables as well.
Done


http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2465
PS5, Line 2465: LOG.trace("table {} exits in cache, last synced id {}", 
tbl.getFullName(),
  :   tbl.getLastSyncedEventId());
Add trace LOG about EP status here, whether tbl is loaded, and whether tbl is 
transactional or not.
Also wrap it with

if (LOG.isTraceEnabled()) { ... }


http://gerrit.cloudera.org:8080/#/c/21437/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2471
PS5, Line 2471: isEventProcessingActive()
What happen if isEventProcessingActive() changed between L2471 and L2483?
Should it be called once and stored to boolean variable instead?


http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266
PS4, Line 1266: test_no_ep_metadata_reload_for_insert
> Good catch!! The issue happens with transactional tables as well.
Done


http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1267
PS5, Line 1267: test_table
Turn this into a parameter of function verify_partition below.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 16:07:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16194/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 00:35:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/5/tests/custom_cluster/test_events_custom_configs.py@1268
PS5, Line 1268: d
flake8: E306 expected 1 blank line before a nested definition, found 0



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 00:11:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#5).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 26 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/5
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11
PS1, Line 11: NullPointerException
> Thanks for your explanation.
Ack. The issue happens with transactional tables as well.


http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2472
PS4, Line 2472: 
!AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters( &&
  :   isEventProcessingActive()
> nit: I think checking isEventProcessingActive() first is better here.
Ack


http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266
PS4, Line 1266: test_no_ep_metadata_reload_for_insert
> Can you test the same scenario over transactional table? Or is there an exi
Good catch!! The issue happens with transactional tables as well.
Addressed it and modified the test to cover the transactional table scenario as 
well.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 21 May 2024 00:10:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11
PS1, Line 11: NullPointerException
> Jira has more details about this. But to explain this in simple terms, with
Thanks for your explanation.

Please describe those scenario in this commit message as short bullet points. I 
think the transactional property should also be highlighted?


http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2472
PS4, Line 2472: 
!AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters( &&
  :   isEventProcessingActive()
nit: I think checking isEventProcessingActive() first is better here.


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748
PS1, Line 1748:
> Discard this change. This is not completely addressing the concern.
Done


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751
PS1, Line 1751:debugAction, partitionToEventId, reason, catalogTimeline);
> Same as above
Done


http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/21437/4/fe/src/main/java/org/apache/impala/util/AcidUtils.java@679
PS4, Line 679: If the tbl metadata is a
 :* superset of the metadata view represented by the given 
validWriteIdList this
 :* method returns a value greater than 0. If they are an exact 
match of each other,
 :* it returns 0 and if the table ValidWriteIdList is behind 
the provided
 :* validWriteIdList this return -1.
nit: update this comment with the new transactional property check.


http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/21437/4/tests/custom_cluster/test_events_custom_configs.py@1266
PS4, Line 1266: test_no_ep_metadata_reload_for_insert
Can you test the same scenario over transactional table? Or is there an 
existing test that already does that?



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 20 May 2024 18:38:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16191/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 20 May 2024 18:19:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16190/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 20 May 2024 18:18:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16189/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 20 May 2024 18:00:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11
PS1, Line 11: NullPointerException
> What is the source of NPE? Where does it hit / manifest problem?
Jira has more details about this. But to explain this in simple terms, with the 
event processor turned off, create a partitioned table from Impala and add 
partition (p=1) by inserting some values, at this point drop the partition 
(p=1) from the hive and insert some values into a partition(p=1), then we would 
hit NULL pointer exception because,

we are reusing metadata while reloading the table, catalogd first removes it 
since it doesn't exist in HMS (fetch partitions from HMS) and then it validates 
each partition in the cache against the partitions from HMS, we would then hit 
precondition check for non-null partition names here: 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1777


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748
PS1, Line 1748:
> Do you mean "not active or not healthy." ?
Discard this change. This is not completely addressing the concern.
I have uploaded a new patchset.


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751
PS1, Line 1751:debugAction, partitionToEventId, reason, catalogTimeline);
> This line remains unchanged for a long time since Fri Dec 18 14:31:24 (IMPA
Same as above



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 20 May 2024 17:53:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#4).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 21 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/4
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#3).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
4 files changed, 23 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/3
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-20 Thread Sai Hemanth Gantasala (Code Review)
Hello Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#2).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M tests/custom_cluster/test_events_custom_configs.py
4 files changed, 26 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/2
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16184/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 May 2024 20:01:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-17 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/1//COMMIT_MSG@11
PS1, Line 11: NullPointerException
What is the source of NPE? Where does it hit / manifest problem?


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1748
PS1, Line 1748: not active
Do you mean "not active or not healthy." ?

  // possible status of event processor
  public enum EventProcessorStatus {
PAUSED, // event processor is paused because catalog is being reset 
concurrently
ACTIVE, // event processor is scheduled at a given frequency
ERROR, // event processor is in error state and event processing has stopped
NEEDS_INVALIDATE, // event processor could not resolve certain events and 
needs a
// manual invalidate command to reset the state (See AlterEvent for a 
example)
STOPPED, // event processor is shutdown. No events will be processed
DISABLED // event processor is not configured to run
  }


http://gerrit.cloudera.org:8080/#/c/21437/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1751
PS1, Line 1751: (HdfsTable) tbl).load(reuseMetadata, msClient.getHiveClient(), 
msTbl,
This line remains unchanged for a long time since Fri Dec 18 14:31:24 
(IMPALA-1480: Slow DDL statements with large number of partitions), presumably 
even before Impala have HMS Event Processor.

What is recently change that makes this cause a problem?
Is metadata reuse also not allowed if EventProcessorStatus.DISABLED?



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 17 May 2024 19:24:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-05-16 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21437


Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always reuse the metadata when reloading a
table. This can lead to data inconsistency issue especially in the case
of event processor being turned off. This patch address this issue by
reusing metadata only when event processor state is active. If it is
not, we should always fetch the latest metadata from HMS.

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
2 files changed, 18 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/1
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala