[Impala-ASF-CR] IMPALA-13159: Fix query cancellation caused by statestore failover

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21520 )

Change subject: IMPALA-13159: Fix query cancellation caused by statestore 
failover
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16345/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I720bec5199df46475b954558abb0637ca7e6298b
Gerrit-Change-Number: 21520
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 14 Jun 2024 04:22:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..

[tools] Add .gitignore for new files

Adds .gitignore for test.jceks - added with IMPALA-12920 - and
hive-site-housekeeping-on (presumably added via a Hive update).

Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Reviewed-on: http://gerrit.cloudera.org:8080/21503
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/test/resources/.gitignore
A testdata/jceks/.gitignore
D testdata/jceks/.gitkeep
3 files changed, 3 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 14 Jun 2024 04:14:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13159: Fix query cancellation caused by statestore failover

2024-06-13 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21520


Change subject: IMPALA-13159: Fix query cancellation caused by statestore 
failover
..

IMPALA-13159: Fix query cancellation caused by statestore failover

A momentary inconsistent cluster membership state after statestore
failover results in query cancellation.
We already have code to handle inconsistent cluster membership after
statestore restarting. This patch adds code to handle inconsistent
cluster membership state after statestore failover in the same way.

Testing:
 - Added a new test case to verify that inconsistent cluster
   membership after statestore failover will not result in query
   cancellation.
 - Fixed closing client issue for Catalogd HA test case
   test_catalogd_failover_with_sync_ddl when the test fails.
 - Passed core test.

Change-Id: I720bec5199df46475b954558abb0637ca7e6298b
---
M be/src/scheduling/cluster-membership-mgr.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M tests/custom_cluster/test_catalogd_ha.py
M tests/custom_cluster/test_statestored_ha.py
5 files changed, 111 insertions(+), 30 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/21520/1
--
To view, visit http://gerrit.cloudera.org:8080/21520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I720bec5199df46475b954558abb0637ca7e6298b
Gerrit-Change-Number: 21520
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21516 )

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16344/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 14 Jun 2024 00:30:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21516

to look at the new patch set (#4).

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..

IMPALA-13077: Fix selectivity estimation for SEMI JOIN

JoinNode.getSemiJoinCardinality() will skip an equality expression if
either NDV or Cardinality of equality expression is unknown (-1). If NDV
is unknown, but Cardinality is known, getSemiJoinCardinality() should
assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV
is being used as a divisor in the selectivity formula to prevent
underestimation.

Testing:
- Add test case where LEFT SEMI JOIN from subquery can reduce
  cardinality estimate of leftmost ScanNode in the query plan.
- Add new pattern at CARDINALITY_FILTER to ignore reduction by runtime
  filter.
- Pass core tests.

Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
4 files changed, 58 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/4
--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21516 )

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16343/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:50:30 +
Gerrit-HasComments: No


[native-toolchain-CR] PROTOTYPE: Build shared libs for googletest

2024-06-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has abandoned this change. ( 
http://gerrit.cloudera.org:8080/21177 )

Change subject: PROTOTYPE: Build shared libs for googletest
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/21177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I817a7721f12053540e06669402d97a69588ac0e9
Gerrit-Change-Number: 21177
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 14:

(4 comments)

Will take another look at this, but here is my preliminary comment.

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/12//COMMIT_MSG@17
PS12, Line 17: r lagged behind. This patch
 : address this issue by always loading the verifying the 
partitions from
 : HMS without loading the file metadata from HMS regardless
> Will change this.
Done


http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21437/14//COMMIT_MSG@18
PS14, Line 18: always loading the verifying the partitions
nit: "always verify the partitions" ?


http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1265
PS14, Line 1265: public void load(boolean reuseMetadata, IMetaStoreClient 
client,
   :   org.apache.hadoop.hive.metastore.api.Table msTbl, 
boolean loadPartitionFileMetadata,
   :   boolean loadTableSchema, boolean 
refreshUpdatedPartitions,
   :   @Nullable Set partitionsToUpdate, @Nullable 
String debugAction,
   :   @Nullable Map partitionToEventId, String 
reason,
   :   EventSequence catalogTimeline, boolean 
isPreLoadForInsert)
This method has 12 parameter now. That is quite long.
Is it possible to wrap them into one Class? And maybe fluent style Builder for 
that class.

LoadParam param = LoadParamBuilder.reuseMetadata(true).metastoreClient(client). 
... .build();

See example at ResourceProfile.java and ResourceProfileBuilder.java.


http://gerrit.cloudera.org:8080/#/c/21437/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1270
PS14, Line 1270: isPreLoadForInsert
Add documentation for isPreLoadForInsert parameter.



--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:42:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21516 )

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..


Patch Set 3:

Patch set 3 move the new test case to PlannerTest.testImplicitJoins() since 
that test validate cardinality.


--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:26:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21516

to look at the new patch set (#3).

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..

IMPALA-13077: Fix selectivity estimation for SEMI JOIN

JoinNode.getSemiJoinCardinality() will skip an equality expression if
either NDV or Cardinality of equality expression is unknown (-1). If NDV
is unknown, but Cardinality is known, getSemiJoinCardinality() should
assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV
is being used as a divisor in the selectivity formula to prevent
underestimation.

Testing:
- Add test case where LEFT SEMI JOIN from subquery can reduce
  cardinality estimate of leftmost ScanNode in the query plan.
- Add new pattern at CARDINALITY_FILTER to ignore reduction by runtime
  filter.
- Pass core tests.

Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
4 files changed, 57 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/3
--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21516 )

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16342/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:16:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21437 )

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16341/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10711/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:05:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:05:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..


Patch Set 1: Code-Review+2

Simple git housekeeping, so adding +2.


--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:04:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 44:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16340/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 44
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 23:03:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21516

to look at the new patch set (#2).

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..

IMPALA-13077: Fix selectivity estimation for SEMI JOIN

JoinNode.getSemiJoinCardinality() will skip an equality expression if
either NDV or Cardinality of equality expression is unknown (-1). If NDV
is unknown, but Cardinality is known, getSemiJoinCardinality() should
assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV
is being used as a divisor in the selectivity formula to prevent
underestimation.

Testing:
- Add test case where LEFT SEMI JOIN from subquery can reduce
  cardinality estimate of leftmost ScanNode in the query plan.
- Pass core tests.

Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
3 files changed, 55 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/2
--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] [tools] Add .gitignore for new files

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21503 )

Change subject: [tools] Add .gitignore for new files
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Gerrit-Change-Number: 21503
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 13 Jun 2024 22:52:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12277: Fix NullPointerException for partitioned inserts when EP is turned off

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21437

to look at the new patch set (#14).

Change subject: IMPALA-12277: Fix NullPointerException for partitioned inserts 
when EP is turned off
..

IMPALA-12277: Fix NullPointerException for partitioned inserts when
EP is turned off

When event processor is turned off, inserting values into partitioned
table can lead to NullPointerException if the partition is deleted
outside impala (eg: HMS). Since event processor is turned off, impala
is unaware of the metadata changes to the table.

Currently in impala, we always load the partitions from cached table.
This can lead to data inconsistency issue especially in the case
of event processor being turned off or lagged behind. This patch
address this issue by always loading the verifying the partitions from
HMS without loading the file metadata from HMS regardless of state of
event processor. This approach will ensure that partition metadata is
always consistent with metastore.

The issue can be seen with the following steps:
- Turn off the event processor
- create a partitioned table and add a partition from impala
- drop the same partition from hive
- from impala, insert values into the partition (expectation is that
if the partition didn't exist, it will create a new one).

Testing:
- Verified manually that NullPointerException is avoided with this patch
- Added end-to-end tests to verify the above scenario for external and
manged tables.

Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
3 files changed, 241 insertions(+), 159 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/21437/14
--
To view, visit http://gerrit.cloudera.org:8080/21437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ide8f1f6bf017e9a040b53bb5d5291ff2ea3e0d18
Gerrit-Change-Number: 21437
Gerrit-PatchSet: 14
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#44).

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..

IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

The idea is that when any DDL/DML operation is performed by Impala, it
also syncs the db/table to its latest event ID as per HMS. This way
updates to a db/table's are applied in the same order as they appear in
the Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL/DML operation first, i.e., perform HMS
operation and then sync the db/table in the catalogD's cache to the
latest event in HMS for the corresponding db/table. By leveraging
HIVE-27499 we are directly fetching the events only for the respective
db/table and process them.

Set 'enable_sync_to_latest_event_on_ddls'to true to enable this
feature.

Performance impact: DDL/DML might need more time to execute due to
fetching and applying other events for corresponding metadata object.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification (IMPALA-12553 has more details about this). We also
don't modify the cache this way for the truncate table operation,
unless the table is replicated or an Iceberg table. The same applies to
insert operation if the table is in Iceberg format. We don't modify
cache using above process for 'refresh table'/'invalidate metadata
table' commands.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.
3) Added some end-to-end tests in test_sync_to_latest_hms_events.py

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
A tests/custom_cluster/test_sync_to_latest_hms_events.py
A tests/metadata/test_common_ddl.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
M tests/metadata/test_event_processing.py
M tests/metadata/test_recover_partitions.py
20 files changed, 1,302 insertions(+), 536 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/44
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 44
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet

2024-06-13 Thread Michael Smith (Code Review)
Hello Laszlo Gaal, Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21515

to look at the new patch set (#2).

Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet
..

IMPALA-13158: Skip building Thrift for rust/dotnet

Skips building Thrift on rust (with-rs) and .NET core (dotnet in Thrift
11, netstd in Thrift 16). Avoids errors if rust compiler is old or
dotnet missing an SDK.

Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d
---
M source/thrift/build.sh
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/15/21515/2
--
To view, visit http://gerrit.cloudera.org:8080/21515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d
Gerrit-Change-Number: 21515
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 


[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet

2024-06-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21515 )

Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet
..


Patch Set 1: Code-Review+2

This makes sense to me


--
To view, visit http://gerrit.cloudera.org:8080/21515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d
Gerrit-Change-Number: 21515
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 13 Jun 2024 22:30:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21516 )

Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16339/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 22:11:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..

IMPALA-12712: Invalidate metadata on table should set better
createEventId

"INVALIDATE METADATA " can be used to bring up a table in
Impala's catalog cache if the table exists in HMS. Currently,
createEventId for such tables are always set as -1 which will lead to
always removing the table. Sequence of drop table + create table +
invalidate table can lead to flaky test failures like IMPALA-12266.

Solution:
When Invalidate metadata  is fired, fetch the latest eventId
from HMS and set it as createEventId for the table, so that drop table
event that happend before invalidate query will be ignored without
removing the table from cache.

Note: Also removed an unnecessary RPC call to HMS to get table object
since we alrady have required info in table metadata rpc call.

Testing:
- Added an end-to-end test to verify that drop table event happened
before time shouldn't remove the metadata object from cache.

Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Reviewed-on: http://gerrit.cloudera.org:8080/21402
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
4 files changed, 34 insertions(+), 11 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 22:08:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13077: Fix selectivity estimation for SEMI JOIN

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21516


Change subject: IMPALA-13077: Fix selectivity estimation for SEMI JOIN
..

IMPALA-13077: Fix selectivity estimation for SEMI JOIN

JoinNode.getSemiJoinCardinality() will skip an equality expression if
either NDV or Cardinality of equality expression is unknown (-1). If NDV
is unknown, but Cardinality is known, getSemiJoinCardinality() should
assume NDV = Cardinality. On the other hand, assume NDV = 1 if that NDV
is being used as a divisor in the selectivity formula to prevent
underestimation.

Testing:
- Add test case where LEFT SEMI JOIN from subquery can reduce
  cardinality estimate of leftmost ScanNode in the query plan.
- Pass core tests.

Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
---
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
3 files changed, 59 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/21516/1
--
To view, visit http://gerrit.cloudera.org:8080/21516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Gerrit-Change-Number: 21516
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[native-toolchain-CR] IMPALA-13158: Skip building Thrift for rust/dotnet

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21515


Change subject: IMPALA-13158: Skip building Thrift for rust/dotnet
..

IMPALA-13158: Skip building Thrift for rust/dotnet

Skips building Thrift on rust (with-rs) and .NET core (dotnet in Thrift
11, netstd in Thrift 17). Avoids errors if rust compiler is old or
dotnet missing an SDK.

Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d
---
M source/thrift/build.sh
1 file changed, 3 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/15/21515/1
--
To view, visit http://gerrit.cloudera.org:8080/21515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iac3b3ca9dffc45853dc22d5c34089b9b9f9e242d
Gerrit-Change-Number: 21515
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 


[native-toolchain-CR] WIP: Add OpenTelemetry C++

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21514 )

Change subject: WIP: Add OpenTelemetry C++
..


Patch Set 1:

Started on this for my own experiments. Builds on Ubuntu 22, still need to do a 
test build on other platforms and try it out in Impala.


--
To view, visit http://gerrit.cloudera.org:8080/21514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05
Gerrit-Change-Number: 21514
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 13 Jun 2024 21:42:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, David Rorke, Michael Smith, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21504

to look at the new patch set (#3).

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..

IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to
Math.log(inputCardinality). This patch fix the issue by avoiding
Math.log(0) and replace it with 0 instead.

After this patch, Instantiating BaseProcessingCost with NaN, infinite,
or negative totalCost will throw IllegalArgumentException. In
BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost"
to avoid confusion with "cost-total" in ProcessingCost.getDetails().

Testing:
- Add testcase that run TOP-N query over empty table.
- Compute ProcessingCost in most FE and EE test even when
  COMPUTE_PROCESSING_COST option is not enabled by checking if
  RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is
  enabled.
- Pass core test.

Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScaledProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test
30 files changed, 218 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21504/3
--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[native-toolchain-CR] WIP: Add OpenTelemetry C++

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21514


Change subject: WIP: Add OpenTelemetry C++
..

WIP: Add OpenTelemetry C++

Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05
---
M buildall.sh
A source/opentelemetry-cpp/build.sh
2 files changed, 38 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/14/21514/1
--
To view, visit http://gerrit.cloudera.org:8080/21514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1657e2d344e90654b5d009d65973d762c0081a05
Gerrit-Change-Number: 21514
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 


[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21455 )

Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16336/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Gerrit-Change-Number: 21455
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 19:12:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 3:

(6 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95
PS2, Line 95:   // Collects the processed data files' file paths and fills 
'intermediate_delete_rows_'
> The function comment is not valid now. Would be nice to mention that this p
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124
PS2, Line 124:   // Keys are data file paths processed by the join operator. 
Values are the file
> Can you add a comment what the keys and values are for in this map?
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146
PS2, Line 146:
 :   /// Inserts the conte
> nit: seems a weird place to break the line
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184
PS2, Line 184: auto& delete_vector =
> To increase readability, do you think it'd make sense to introduce a variab
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259
PS2, Line 259: rrent delet
> nit: I'd break the line before the first param, and then each param could f
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325
PS2, Line 325:   state->LogError(
> KrpcDataStreamSender already filters out NULL file_paths. Is there a need t
We can see NULLs when NUM_NODES=1



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:31:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21452

to look at the new patch set (#3).

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
..

IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

With this patch IcebergDeleteBuilder checks how many probe threads
are actually blocked on the builder. Let's assume the following plan:

 UNION ALL
/ \
   /   \
  / \
 SCAN allANTI JOIN
 datafiles  / \
 without   /   \
 deletes  SCAN SCAN
  datafilesdeletes
  with deletes

In that case UNION ALL, and the two "SCAN datafiles" operators are in
the same fragment, while the builder of the ANTI JOIN is in a different
fragment. This means that "SCAN datafiles without deletes" can run in
parallel with the builder. But once that SCAN is exhausted, the UNION
ALL will drain rows from "SCAN datafiles with deletes" via the ANTI JOIN
operator, but that operator depends on the join builder output.

This means in some cases the SCAN fragments are busy, while in other
cases the SCAN fragments are blocked. It depends on how much work
they need to do, and how much work the build-side needs to do. So to
handle all cases, we dynamically check how many probe fragments are
blocked on the builder, then spin up as many threads to parellelize
the final sort.

This also works well when we have the following plan:

ANTI JOIN
   / \
  /   \
 SCAN SCAN
 datafilesdeletes
 with deletes

The above plan is created when all data files have corresponding
deletes, or when we are running a simple count(*) query. In that
case all "SCAN datafiles" fragments are blocked on the builder,
so we can use that many threads to sort the build results.

A new field "ThreadCountInFinalBuild" was added, so we can check the
query profile about how many threads were used for the final
sorting in the builders.

Measurements:
In a table with 1 Trillion data records and 68.5 Billion delete records
it reduced "IcebergDeletePositionSortTimer" from ~1 minute to
8-10 seconds, in an environment with 40 executors and MT_DOP=12.

TODO:
 * e2e tests that check counter "ThreadCountInFinalBuild"

Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
4 files changed, 105 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/21452/3
--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21402

to look at the new patch set (#4).

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..

IMPALA-12712: Invalidate metadata on table should set better
createEventId

"INVALIDATE METADATA " can be used to bring up a table in
Impala's catalog cache if the table exists in HMS. Currently,
createEventId for such tables are always set as -1 which will lead to
always removing the table. Sequence of drop table + create table +
invalidate table can lead to flaky test failures like IMPALA-12266.

Solution:
When Invalidate metadata  is fired, fetch the latest eventId
from HMS and set it as createEventId for the table, so that drop table
event that happend before invalidate query will be ignored without
removing the table from cache.

Note: Also removed an unnecessary RPC call to HMS to get table object
since we alrady have required info in table metadata rpc call.

Testing:
- Added an end-to-end test to verify that drop table event happened
before time shouldn't remove the metadata object from cache.

Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
4 files changed, 34 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/21402/4
--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21482 )

Change subject: IMPALA-13137: Add additional client fetch metrics columns to 
the queries page
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG@11
PS4, Line 11:   - First row fetched time
:   - Client fetch wait time
Match this with chosen label in queries.tmpl


http://gerrit.cloudera.org:8080/#/c/21482/4//COMMIT_MSG@18
PS4, Line 18: fetch all rows
"fetch all rows since the first row"


http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc@463
PS4, Line 463: " fetch all rows."
" fetch all rows since the first row."


http://gerrit.cloudera.org:8080/#/c/21482/4/be/src/service/impala-http-handler.cc@548
PS4, Line 548: "First row fetched"
Make this a constant and share it with the same occurrence in coordinator.cc

runtime/coordinator.cc:1051:query_events_->MarkEvent("First row fetched");


http://gerrit.cloudera.org:8080/#/c/21482/3/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21482/3/www/queries.tmpl@84
PS3, Line 84: {{default_db}}
: {{stmt_type}}
> I agree. Done.
Done


http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl@117
PS4, Line 117: First row fetched
 :   
 :   Client fetch 
wait time
nit: Since this column is all about Client time, I'd suggest shorter label 
"FirstFetch" and "FetchDuration". Does that make sense?



--
To view, visit http://gerrit.cloudera.org:8080/21482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5
Gerrit-Change-Number: 21482
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 13 Jun 2024 20:59:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21504 )

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java
File fe/src/main/java/org/apache/impala/planner/DataSink.java:

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java@141
PS2, Line 141: "Processing cost of DataSink " + fragment_.getId() + ":" 
+ getLabel()
nit: could improve this string construction using a format string with 
Preconditions.checkState: 
https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html#checkState(boolean,%20java.lang.String,%20java.lang.Object...)


http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342
PS2, Line 342: long outputCardinality = Math.max(0, getCardinality());
Is there some reason we need cardinality_ to default to -1 rather than 0?



--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 18:35:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21455 )

Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py
File tests/query_test/test_join_queries.py:

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218
PS7, Line 218: parquet/snap/bloc
> Is there a reason for using this instead of default parquet/none?
With parquet_none, the catalog_sales table is not partitioned, and the minimum 
MEM_LIMIT to pass admission control is 183mb. That is high enough for this test 
to pass with and without the improvement code.

Meanwhile, in tpcds_partitioned_snap_block, both store_sales and catalog_sales 
are partitioned, allowing query to pass admission control with lower 149mb 
MEM_LIMIT. However, without the improvement code, it will fail execution in 
backend due to memory limit exceeded.
https://issues.apache.org/jira/browse/IMPALA-13075?focusedCommentId=17848720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17848720

I will mention this in test comment.



--
To view, visit http://gerrit.cloudera.org:8080/21455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Gerrit-Change-Number: 21455
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 18:16:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-06-13 Thread Andrew Sherman (Code Review)
Andrew Sherman has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/21433 )

Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker 
images
..

IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

When the Impala docker images are deployed in production environments,
it can be hard to add debugging tools at runtime. Two of the most
useful diagnostic tools are jstack and pstack, which can be used to
print Java and native stack traces. Install these tools into Redhat
images which are the most commonly used in production.

To install pstack we install gdb
To install jstack we install a development jdk on top of the headless
jdk.

Extend the install_os_packages.sh script to add an argument to
--install-debug-tools to set the level of diagnostic tools to install.
The possible arguments are:
  none - install no extra tools
  basic - install pstack and jstack
  full - install more debugging tools.

In a Centos 8.5 build, the size of a impalad_coord_exec image increased
from 1.74GB to 1.85GB, as reported by ‘docker image list’.

What other tools might be added?
- Installing perf is tricky as in a container perf requires an
  installation specific to the underlying linux kernel image, which is
  hard to predict at build time.
- Installing pprof is hard as installation seems to require compiling
  from sources. Clearly there are many options and we cannot install
  everything.

TESTING

Built release and debug docker images, and used jstack and pstack in a
running container to print Impala's stacks.

Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
---
M docker/CMakeLists.txt
M docker/impala_base/Dockerfile
M docker/impala_profile_tool/Dockerfile
M docker/install_os_packages.sh
4 files changed, 44 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/21433/4
--
To view, visit http://gerrit.cloudera.org:8080/21433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Gerrit-Change-Number: 21433
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21452 )

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
..


Patch Set 2:

(7 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@33
PS2, Line 33: build fragments
> aren't these 'probe' fragments that are blocked on the builder?
Thanks for catching this, done.


http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@37
PS2, Line 37: The
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/21452/2//COMMIT_MSG@57
PS2, Line 57: lowered
> reduced?
Done


http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.h@147
PS2, Line 147:   Status FinalizeBuildImplParallel(int num_threads);
> Since this one can return error codes too, could you add a comment in what
Done


http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc@314
PS2, Line 314: IcebergDeleteBuilder::DeleteRowVector 
IcebergDeleteBuilder::GetFinalDeleteRowVector(
> This and the other PR made me think about whther it makes sense to create a
That's a good idea. However, I have another Jira ticket to throw out all of 
these, or at least the final delete vectors, and use RoaringBitmap: IMPALA-13109
I'm afraid I'd have to throw out half of the work with that change. So probably 
it would be better to do the encapsulation and refactoring for the 
RoaringBitmap.


http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/iceberg-delete-builder.cc@320
PS2, Line 320:   int64_t capacity = 0;
> indentation
Done


http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/join-builder.h
File be/src/exec/join-builder.h:

http://gerrit.cloudera.org:8080/#/c/21452/2/be/src/exec/join-builder.h@197
PS2, Line 197:   int probes_waiting_on_builder_ = 0;
> The variable name is self-explanatory, but some additional comment that is
Done



--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:59:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21504 )

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 19:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-06-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21433 )

Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker 
images
..


Patch Set 4: Code-Review+1

This is looking good to me, thanks for putting this together. I'll bump to +2 
when it goes through precommit.


--
To view, visit http://gerrit.cloudera.org:8080/21433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Gerrit-Change-Number: 21433
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Thu, 13 Jun 2024 18:55:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21504 )

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16338/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 19:44:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21504 )

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java
File fe/src/main/java/org/apache/impala/planner/DataSink.java:

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/DataSink.java@141
PS2, Line 141: "Processing cost of DataSink %s:%s is invalid! %s", 
fragment_.getId(), getLabel(),
> nit: could improve this string construction using a format string with Prec
Done


http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21504/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@342
PS2, Line 342: long outputCardinality = Math.max(0, getCardinality());
> Is there some reason we need cardinality_ to default to -1 rather than 0?
PlanNode starts with unknown cardinality (-1) until stats are analyzed. Unknown 
is represented as -1, and some planner code handle unknown cardinality with 
best guess estimate like in some ScanNode subclasses when stats are unavailable.
https://gerrit.cloudera.org/c/21504/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#2256



--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 19:23:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16333/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:54:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16334/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 17:23:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13076 Add pstack and jstack to Impala Redhat docker images

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21433 )

Change subject: IMPALA-13076 Add pstack and jstack to Impala Redhat docker 
images
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16337/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Gerrit-Change-Number: 21433
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Thu, 13 Jun 2024 19:15:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 2) Parallelize final sorts in IcebergDeleteBuilder

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21452 )

Change subject: IMPALA-13088: (part 2) Parallelize final sorts in 
IcebergDeleteBuilder
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16335/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ca946a452d061238255e9b0e2c81a51cac68807
Gerrit-Change-Number: 21452
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 17:26:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

2024-06-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21455 )

Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py@119
PS7, Line 119:
> the create...dimension functions below could be shorter by calling this fun
Done


http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py
File tests/query_test/test_join_queries.py:

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218
PS7, Line 218:
> With parquet_none, the catalog_sales table is not partitioned, and the mini
Done



--
To view, visit http://gerrit.cloudera.org:8080/21455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Gerrit-Change-Number: 21455
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 18:48:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

2024-06-13 Thread Riza Suminto (Code Review)
Hello Yida Wu, Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21455

to look at the new patch set (#8).

Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
..

IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

ExprValuesCache uses BATCH_SIZE as a deciding factor to set its
capacity. It bounds the capacity such that expr_values_array_ memory
usage stays below 256KB. This patch tightens that limit to include all
memory usage from ExprValuesCache::MemUsage() instead of
expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will
not push the total memory usage of ExprValuesCache beyond 256KB.

Simplify table dimension creation methods and fix few flake8 warnings in
test_dimensions.py.

Testing:
- Add test_join_queries.py::TestExprValueCache.
- Pass core tests.

Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
---
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M bin/rat_exclude_files.txt
A testdata/workloads/tpcds_partitioned/queries
M tests/common/test_dimensions.py
M tests/query_test/test_join_queries.py
6 files changed, 70 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/21455/8
--
To view, visit http://gerrit.cloudera.org:8080/21455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Gerrit-Change-Number: 21455
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG@19
PS3, Line 19: event happend
nit: "event that happened"



--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:38:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21402/3//COMMIT_MSG@19
PS3, Line 19: event that hap
> nit: "event that happened"
Done



--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:58:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:59:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12712: Invalidate metadata on table should set better createEventId

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21402 )

Change subject: IMPALA-12712: Invalidate metadata on table should set better 
createEventId
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10710/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Gerrit-Change-Number: 21402
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 13 Jun 2024 17:00:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#3).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 108 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/3
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21504 )

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16332/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 13 Jun 2024 15:13:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables

2024-06-13 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
..


Patch Set 7:

(25 comments)

Let me flush another batch of comments now. I'm somewhere halfway through the 
analyzer changes.

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/21423/6/fe/src/main/cup/sql-parser.cup@1027
PS6, Line 1027: List cases = Lists.newArrayList(merge_case);
nit: I think here we can merge these 2 rows


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java
File fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@60
PS7, Line 60: public class IcebergMergeImpl extends MergeImpl {
As I wrote in MergeStmt, I'm not sure I'm convinced that we need a separate 
abstraction for the implementation. One reason is that we don't plan to add any 
other implementations and even if we did, the coding should hppen the other way 
around: when we add the other Merge impl then as a separate commit we could do 
the extra level of abstracion then as a refactor.

The other thing I find myself uncomfortable with is that this IcebergMergeImpl 
now uses all the inner members of MergeStmt, even the MergeStmt itself is 
passed to the constructor. For me this asks for Inheritance rather than an impl 
member.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@93
PS7, Line 93: if (mergeStmt_.isOnlyMatchedCases()) {
:   sourceTableRef_.setJoinOp(JoinOperator.INNER_JOIN);
: } else {
:   sourceTableRef_.setJoinOp(JoinOperator.FULL_OUTER_JOIN);
: }
: sourceTableRef_.setOnClause(on_);
: sourceTableRef_.setLeftTblRef(targetTableRef_);
This could go into a setJoinParams() function


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@120
PS7, Line 120: icebergPositionalDeleteTable_ = new 
IcebergPositionDeleteTable(
If you have only a WHEN NOT NATCHED THEN INSERT case do you still have to 
create a pos del table?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@127
PS7, Line 127: for (MergeCase mergeCase : mergeStmt_.getCases()) {
Since all these merge cases live inside MergeStmt, shouldn't this code live in 
MergeStmt.analyze()?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@131
PS7, Line 131: analyzer.registerPrivReq(
Here we require ALL privs on the target table. Shouldn't we also expect SELECT 
privs on the source table?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@138
PS7, Line 138:   private void addMergeActionTuple(Analyzer analyzer) {
Function comment pls


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@140
PS7, Line 140: 
analyzer.getDescTbl().createTupleDescriptor(MERGE_ACTION_TUPLE_NAME);
What is a merge action and why is a tuple needed for it? Could you comment it 
on the class?


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@153
PS7, Line 153:   public void substituteResultExprs(ExprSubstitutionMap smap,
It stinks for me that MergeStmt also has a fn with similar name but here we 
inherit this not from there but from another interface, MergeImpl. For me this 
also indicates that this class should be a child of MergeStmt


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@186
PS7, Line 186:   public PlanNode getPlanNode(PlannerContext ctx, PlanNode 
parent,
Creating new PlanNodes should be the responsibility of the Planner, in my 
opinion. An Analysis class in the Analysis package shouldn't know how a 
corresponding PlanNode is created.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java@204
PS7, Line 204:   public DataSink createDataSink() {
I think this should be an override. Also true to some other functions in this 
class.


http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java
File fe/src/main/java/org/apache/impala/analysis/MergeImpl.java:

http://gerrit.cloudera.org:8080/#/c/21423/7/fe/src/main/java/org/apache/impala/analysis/MergeImpl.java@20
PS7, Line 20: public abstract QueryStmt prepareQuery();
Since this serves as an interface for the Merge implementations 

[Impala-ASF-CR] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

2024-06-13 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Abhishek Rawat, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21504

to look at the new patch set (#2).

Change subject: IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost
..

IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to
Math.log(inputCardinality). This patch fix the issue by avoiding
Math.log(0) and replace it with 0 instead.

After this patch, Instantiating BaseProcessingCost with NaN, infinite,
or negative totalCost will throw IllegalArgumentException. In
BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost"
to avoid confusion with "cost-total" in ProcessingCost.getDetails().

Testing:
- Add testcase that run TOP-N query over empty table.
- Compute ProcessingCost in most FE and EE test even when
  COMPUTE_PROCESSING_COST option is not enabled by checking if
  RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is
  enabled.
- Pass core test.

Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
---
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test
25 files changed, 207 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21504/2
--
To view, visit http://gerrit.cloudera.org:8080/21504
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Gerrit-Change-Number: 21504
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21502 )

Change subject: IMPALA-13138: Never smallify existing StringValue objects, only 
new ones during DeepCopy
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16331/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99
Gerrit-Change-Number: 21502
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 13:26:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21502 )

Change subject: IMPALA-13138: Never smallify existing StringValue objects, only 
new ones during DeepCopy
..


Patch Set 4:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc
File be/src/runtime/string-value-test.cc:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81
PS3, Line 81: , cons
> Could take const ref.
Done


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81
PS3, Line 81: Small
> Nit: "...Smaller..." would be better.
Done


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h@73
PS3, Line 73: DCHECK_LE(source.Len(), SmallableString::SMALL_LIMIT);
> Is there a reason why we DCHECK it instead of simply returning a non-smalli
To ensure that the user of this function can do anything with the returned 
object. We could only ensure this for non-small string values if we deep copied 
them, and that would require a mem pool parameter as well.


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc@112
PS3, Line 112: ,
> Nit: comma instead of period.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99
Gerrit-Change-Number: 21502
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 13:14:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21501 )

Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16330/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
Gerrit-Change-Number: 21501
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 13 Jun 2024 13:16:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21502 )

Change subject: IMPALA-13138: Never smallify existing StringValue objects, only 
new ones during DeepCopy
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10709/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99
Gerrit-Change-Number: 21502
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 13:14:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()

2024-06-13 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/21501 )

Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
..

IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()

In StringVal::CopyFrom(), we take the 'len' parameter as a size_t, which
is usually a 64-bit unsigned integer. We pass it to the constructor of
StringVal, which takes it as an int, which is usually a 32-bit signed
integer. The constructor then allocates memory for the length using the
int value, but afterwards in CopyFrom(), we copy the buffer with the
size_t length. If size_t is indeed 64 bits and int is 32 bits, and the
value is truncated, we may copy more bytes that what we have allocated
for the destination.

Note that in the constructor of StringVal it is checked whether the
length is greater than 1GB, but if the value is truncated because of the
type conversion, the check doesn't necessarily catch it as the truncated
value may be small.

This change fixes the problem by doing the length check with 64 bit
integers in StringVal::CopyFrom().

Testing:
 - added unit tests for StringVal::CopyFrom() in udf-test.cc.

Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
---
M be/src/udf/udf-test.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
3 files changed, 90 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/21501/2
--
To view, visit http://gerrit.cloudera.org:8080/21501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
Gerrit-Change-Number: 21501
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/16329/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
Gerrit-Change-Number: 21423
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 12:40:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

2024-06-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21455 )

Change subject: IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB
..


Patch Set 7: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/common/test_dimensions.py@119
PS7, Line 119: create_table_format_dimension
the create...dimension functions below could be shorter by calling this 
functions


http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py
File tests/query_test/test_join_queries.py:

http://gerrit.cloudera.org:8080/#/c/21455/7/tests/query_test/test_join_queries.py@218
PS7, Line 218: parquet/snap/bloc
Is there a reason for using this instead of default parquet/none?



--
To view, visit http://gerrit.cloudera.org:8080/21455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Gerrit-Change-Number: 21455
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 12:31:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables

2024-06-13 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
..


Patch Set 7:

(63 comments)

Thank you Zoltan and Gabor! I covered most of the comments, some of them are 
still open. If we could resolve those, I'll split up this change into multiple 
pieces to make reviewing easier.

http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-12732: Add support for MERGE statements for Iceberg tables
> I know MERGE is a known SQL functionality but I'd find it useful to add a f
Done


http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@11
PS6, Line 11: ource table. This change adds MERGE
> Would you mind adding some examples?
Done


http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@24
PS6, Line 24:
> nit: maybe 'phase' is a better word here
Done


http://gerrit.cloudera.org:8080/#/c/21423/6//COMMIT_MSG@47
PS6, Line 47:
> We should also have authorization tests. Also for fine-grained authz, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h
File be/src/exec/iceberg-merge-node.h:

http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@18
PS6, Line 18: #pragma once
:
> nit: we prefer #pragma once
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@43
PS6, Line 43: /// for merge action and for the target table.
> Please add class comment
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@101
PS6, Line 101: /// added to the
> We prefer std::unique_ptr
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.h@158
PS6, Line 158: se(con
> no need for 'inline', member-functions defined inside class definitions are
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc
File be/src/exec/iceberg-merge-node.cc:

http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@165
PS6, Line 165: os
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@178
PS6, Line 178:   auto row = iter.Ge
> I'm not sure if we want to reset last_row_ each time EvaluateCases() is inv
Removed


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@186
PS6, Line 186: last_row_ = row;
 :
> Can we also output the duplicated row with PrintTuple()?
It's a tricky situation; it's easy to print the target table's tuple, but it 
also contains the virtual columns, so for small tables, the path will be the 
dominant part of the error message; I think it's acceptable. To print the 
source tuples, we have to consider cases where the source row consists of 
multiple tuples, so first we have to distinguish the source tuples or the tuple 
that is used for the join, and then print them. There's another quite simple 
option; using PrintRow. What do you think?

Examples:
1. ERROR: Duplicate row found: one target table row matched more than one 
source row. Target row: (3 3 3 
hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq
 3)
2. ERROR: Duplicate row found: one target table row matched more than one 
source row. Target row: (4 4 4 
hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq
 4), Source row: (4 4 4) (4 4 4) (4 4 4) (50)
3. ERROR: Duplicate row found: one target table row matched more than one 
source row. Affected row: [(4 4 4 
hdfs://localhost:20500/test-warehouse/target_part/data/514806418e9554cc-654b33170005_831375517_data.0.parq
 0) (4 4 4) (4 4 4) (4 4 4) (50)]  // The source table ref here are 4 joined 
tables


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@209
PS6, Line 209: if (ReachedLimit() |
> nit: to reduce nesting, we could have
Done


http://gerrit.cloudera.org:8080/#/c/21423/6/be/src/exec/iceberg-merge-node.cc@210
PS6, Line 210:   }
 :   return Status::OK();
 : }
 :
 : void IcebergMergeNode::AddRow(
 : RowBatch* output_batch, IcebergMergeCase* merge_case, 
TupleRow* row) {
 :   int dst_row_idx = output_batch->AddRow();
 :   TupleRow* dst_row = output_batch->GetRow(dst_row_idx);
 :
 :   auto* target_tuple =
 :   
Tuple::Create(row_descriptor_.tuple_descriptors()[target_tuple_idx_]->byte_size(),
 :   output_batch->tuple_data_pool());
 :   auto* merge_action_tuple = Tuple::Create(
 :   
row_descriptor_.tuple_descriptors()[merge_action_tuple_idx_]->byte_size(),
 :   output_batch->tuple_data_pool());
 :
   

[Impala-ASF-CR] IMPALA-12732: Add support for MERGE statements for Iceberg tables

2024-06-13 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/21423 )

Change subject: IMPALA-12732: Add support for MERGE statements for Iceberg 
tables
..

IMPALA-12732: Add support for MERGE statements for Iceberg tables

MERGE statement is a DML command that allows users to perform
conditional insert, update, or delete operations on a target table based
on the results of a join with a source table. This change adds MERGE
statement parsing and an Iceberg-specific semantic analysis, planning,
and execution. The parsing grammar follows the SQL standard, it accepts
the same syntax as Hive, Spark, and Trino by supporting arbitrary number
of WHEN clauses, with conditions or without and accepting inline views
as source.

Example:
'MERGE INTO target t USING source s ON t.id = s.id
WHEN MATCHED AND t.id < 100 THEN UPDATE SET column1 = s.column1
WHEN MATCHED AND t.id > 100 THEN DELETE
WHEN MATCHED THEN UPDATE SET column1 = "value"
WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.column1);'

The Iceberg-specific analysis, planning, and execution are based on a
concept that was previously used for UPDATE: The analyzer creates a
SELECT statement with all target and source columns (including
Iceberg's virtual columns) and a 'row_present' column that defines
whether the source, the target, or both rows are present in the result
set after joining the two table references by the ON clause. The join
condition should be an equi-join, as it is a FULL OUTER JOIN, and Impala
currently supports only equi-joins in this case. The joining order is
forced by a query hint, this guarantees that the target table is always
on the left side.

A new, IcebergMergeNode is added at planning level, this node does the
row-level filtering for each MATCHED/ NOT MATCHED cases. The
'row_present' column decides which case group will be evaluated; if
both sides are available, the matched cases, if only the source side
matches then the not matched cases and their filter expressions
will be evaluated over the row. If one of the cases match, then the
execution evaluates the result expressions into the output row batch,
and an auxiliary tuple will store the merge action. The merge action is
a flag for the newly added IcebergMergeSink; this sink will route each
incoming row from IcebergMergeNode to their respective destination. Each
row could go to the delete sink, insert sink, or to both sinks.

Target-side duplicate records are filtered during IcebergMergeNode's
execution, if one target table-side duplicate is detected, the whole
statement's execution is stopped and the error is reported back to the
user.

Added tests:
 - Parser tests
 - Analyzer tests
 - Unit test for WHEN NOT MATCHED INSERT column collation
 - Planner tests for partitioned/sorted cases
 - Authorization tests
 - E2E tests

Change-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-merge-node.cc
A be/src/exec/iceberg-merge-node.h
A be/src/exec/iceberg-merge-sink.cc
A be/src/exec/iceberg-merge-sink.h
M be/src/service/client-request-state.cc
M common/thrift/DataSinks.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/DmlStatementBase.java
A fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/KuduModifyImpl.java
A fe/src/main/java/org/apache/impala/analysis/MergeCase.java
A fe/src/main/java/org/apache/impala/analysis/MergeDelete.java
A fe/src/main/java/org/apache/impala/analysis/MergeImpl.java
A fe/src/main/java/org/apache/impala/analysis/MergeInsert.java
A fe/src/main/java/org/apache/impala/analysis/MergeStmt.java
A fe/src/main/java/org/apache/impala/analysis/MergeUpdate.java
M fe/src/main/java/org/apache/impala/analysis/ModifyImpl.java
M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
A fe/src/main/java/org/apache/impala/planner/IcebergMergeNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergMergeSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/MergeInsertTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M 

[Impala-ASF-CR] IMPALA-13138: Never smallify existing StringValue objects, only new ones during DeepCopy

2024-06-13 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21502 )

Change subject: IMPALA-13138: Never smallify existing StringValue objects, only 
new ones during DeepCopy
..


Patch Set 3:

(4 comments)

Thanks Zoltán!

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc
File be/src/runtime/string-value-test.cc:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81
PS3, Line 81: Small
Nit: "...Smaller..." would be better.


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value-test.cc@81
PS3, Line 81: string
Could take const ref.


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/string-value.h@73
PS3, Line 73: DCHECK_LE(source.Len(), SmallableString::SMALL_LIMIT);
Is there a reason why we DCHECK it instead of simply returning a non-smallified 
string if the length is too large?


http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/21502/3/be/src/runtime/tuple.cc@112
PS3, Line 112: .
Nit: comma instead of period.



--
To view, visit http://gerrit.cloudera.org:8080/21502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99
Gerrit-Change-Number: 21502
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 11:21:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()

2024-06-13 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21501 )

Change subject: IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc
File be/src/udf/udf-test.cc:

http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc@340
PS1, Line 340: dummpy
typo: dummy


http://gerrit.cloudera.org:8080/#/c/21501/1/be/src/udf/udf-test.cc@344
PS1, Line 344:   }
nit: could be one line



--
To view, visit http://gerrit.cloudera.org:8080/21501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
Gerrit-Change-Number: 21501
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 13 Jun 2024 11:04:27 +
Gerrit-HasComments: Yes