[Impala-ASF-CR] IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

2024-01-29 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20968 )

Change subject: IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20968/1/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20968/1/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@198
PS1, Line 198: *eos = true;
I think if we exit the loop because of row_batch->AtCapacity(), *eos should not 
be set to true, because we haven't reached the limit yet and there are more 
rows to scan.



--
To view, visit http://gerrit.cloudera.org:8080/20968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
Gerrit-Change-Number: 20968
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Jan 2024 07:53:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml
File docs/topics/impala_identifiers.xml:

http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml@54
PS2, Line 54: The maximum length of an identifier is currently 128 
characters except for column names which
> It's 767 characters for column names, the same is applied in hive.
Ack. Double checked in Hive codes:
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/resources/package.jdo#L24-L25
It's added in HIVE-1364.


http://gerrit.cloudera.org:8080/#/c/20950/4/docs/topics/impala_identifiers.xml
File docs/topics/impala_identifiers.xml:

http://gerrit.cloudera.org:8080/#/c/20950/4/docs/topics/impala_identifiers.xml@62
PS4, Line 62: can contain unicode characters
nit: This is talking about the first charactor. Let's use "can start with any 
unicode character". BTW, is it truth that column names can start with 
whitespaces?



--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 07:47:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12744: Support for regr count() aggregate/analytic function

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20936 )

Change subject: IMPALA-12744: Support for regr_count() aggregate/analytic 
function
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15098/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I773d1e0edc8a9c8ee003f75721f4844685b2eb38
Gerrit-Change-Number: 20936
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Jan 2024 07:24:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12744: Support for regr count() aggregate/analytic function

2024-01-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20936 )

Change subject: IMPALA-12744: Support for regr_count() aggregate/analytic 
function
..


Patch Set 2:

(10 comments)

> Uploaded patch set 2.

http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@7
PS1, Line 7: aggregate
> If it can be used as both an aggregate and an analytical function, having "
Done


http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@9
PS1, Line 9: regr_count
> Nit: "The regr_count() functin..."
Done


http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@9
PS1, Line 9: as
> Nit: "as an"
Done


http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@9
PS1, Line 9: and
> Nit: "and an"
Done


http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@10
PS1, Line 10: function
> Nit: function (singular).
Done


http://gerrit.cloudera.org:8080/#/c/20936/1//COMMIT_MSG@10
PS1, Line 10: s c
> Nit: is.
Done


http://gerrit.cloudera.org:8080/#/c/20936/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20936/1/be/src/exprs/aggregate-functions-ir.cc@315
PS1, Line 315: CHECK(!dst->is_null);
 :   if (!src1.is_null && !src2.is_null) {
 : --dst->val;
> Is this different here than in the non-timestamp version, i.e. on L300?
Removed


http://gerrit.cloudera.org:8080/#/c/20936/1/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

http://gerrit.cloudera.org:8080/#/c/20936/1/be/src/exprs/aggregate-functions.h@74
PS1, Line 74:
> We don't use 'ctx' and in the above two functions this variable was unnamed
Done


http://gerrit.cloudera.org:8080/#/c/20936/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/20936/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2713
PS1, Line 2713: 1 in
> This doesn't return null, it returns 1.
Ack


http://gerrit.cloudera.org:8080/#/c/20936/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2802
PS1, Line 2802: rand()
> Is there a specific reason for using rand() and not a simple constant?
No, not really, it was brought up in corr() patch I think and since then it's 
kept in the same way to maintain uniformity.



--
To view, visit http://gerrit.cloudera.org:8080/20936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I773d1e0edc8a9c8ee003f75721f4844685b2eb38
Gerrit-Change-Number: 20936
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 30 Jan 2024 07:01:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12744: Support for regr count() aggregate/analytic function

2024-01-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20936 )

Change subject: IMPALA-12744: Support for regr_count() aggregate/analytic 
function
..

IMPALA-12744: Support for regr_count() aggregate/analytic function

regr_count() function can be used both as aggregate and
analytic function and is commonly used in regression analysis.

regr_count(y, x) returns an integer that is the number of non-null
number pairs. It indicates how many observations are included in the
analysis.

Testing:
The functions are extensively tested and cross-checked with Hive.
The tests can be found in aggregation.test.

Change-Id: I773d1e0edc8a9c8ee003f75721f4844685b2eb38
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 303 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/20936/2
--
To view, visit http://gerrit.cloudera.org:8080/20936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I773d1e0edc8a9c8ee003f75721f4844685b2eb38
Gerrit-Change-Number: 20936
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 4: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/747/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 06:45:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 3: Verified-1

Build Failed

https://jenkins.impala.io/job/gerrit-docs-auto-test/746/ : Doc tests failed. 
See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 06:39:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 4:

(4 comments)

> Uploaded patch set 4.

http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml
File docs/topics/impala_identifiers.xml:

http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml@54
PS2, Line 54: The maximum length of an identifier is currently 128 
characters except for column names which
> Could you double check this? What's the longest length we can used when usi
It's 767 characters for column names, the same is applied in hive.


http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml@60
PS2, Line 60: 
> Add "except for column names which can start with unicode characters"
Done


http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml@67
PS2, Line 67:   
> Add "except for column names which can contain unicode characters"
Done


http://gerrit.cloudera.org:8080/#/c/20950/2/docs/topics/impala_identifiers.xml@124
PS2, Line 124:
> I think we'd better merge this into the previous items.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 06:39:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 4:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/747/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 06:39:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..

IMPALA-12756: [DOCS] Unicode column name support documentation

The patch focuses on documenting that Impala supports unicode
column names, consistent with Hive's current support (as we use
Hive MetaStore to store table metadata).

Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
---
M docs/topics/impala_identifiers.xml
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20950/4
--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..

IMPALA-12756: [DOCS] Unicode column name support documentation

The patch focuses on documenting that Impala supports unicode
column names, consistent with Hive's current support (as we use
Hive MetaStore to store table metadata).

Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
---
M docs/topics/impala_identifiers.xml
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20950/3
--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12756: [DOCS] Unicode column name support documentation

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20950 )

Change subject: IMPALA-12756: [DOCS] Unicode column name support documentation
..


Patch Set 3:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/746/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/20950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Gerrit-Change-Number: 20950
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 06:34:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20972 )

Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..

IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

This patch upgrade Guava version from 31.1-jre to 32.0.1-jre to address
CVE-2023-2976.

Testing:
- Run and pass full build
  ./buildall.sh -skiptests -notests

Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Reviewed-on: http://gerrit.cloudera.org:8080/20972
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20972 )

Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Jan 2024 05:31:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20916 )

Change subject: IMPALA-12578: Pass owner user of database and table to Ranger 
in GRANT/REVOKE
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibac5335c65a860963ef0ccd890a926af80585ef3
Gerrit-Change-Number: 20916
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 04:43:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20972 )

Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Jan 2024 03:47:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-29 Thread Zihao Ye (Code Review)
Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20965 )

Change subject: IMPALA-12762: Fix cmake error in package building
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Jan 2024 01:49:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15097/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 14
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Jan 2024 01:05:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table

2024-01-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20915 )

Change subject: IMPALA-12503: Support date data type for predicates for 
external data source table
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20915/10/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20915/10/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@219
PS10, Line 219: dateCol+ 1
Need to set the time zone for the formatter as 'UDC' to avoid +1:
  formatter.setTimeZone(TimeZone.getTimeZone("UDC"));



--
To view, visit http://gerrit.cloudera.org:8080/20915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Gerrit-Change-Number: 20915
Gerrit-PatchSet: 10
Gerrit-Owner: gaurav singh 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Tue, 30 Jan 2024 00:51:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20770/14/tests/custom_cluster/test_query_log.py
File tests/custom_cluster/test_query_log.py:

http://gerrit.cloudera.org:8080/#/c/20770/14/tests/custom_cluster/test_query_log.py@31
PS14, Line 31: from tests.util.assert_time import assert_time_str
flake8: F401 'tests.util.assert_time.assert_time_str' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 14
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 30 Jan 2024 00:39:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Jason Fehr (Code Review)
Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
..


Patch Set 14:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h@1261
PS13, Line 1261:   /// The QueryDriverMap maps query ids to QueryDrivers. The 
QueryDrivers are owned by the
   :   /// ImpalaServer and QueryDriverMap references them using 
shared_ptr to allow
   :   /// asynchronous deletion.
   :   ///
> Mention in comment if they are synchronize using completed_queries_lock_.
Done


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/impala-server.h@1653
PS13, Line 1653: };
> nit: unnecessary added whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h
File be/src/service/query-state-record.h:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@183
PS13, Line 183: class EventsTimelineIterator {
> Missing code comments outlining the purpose of this class, and what each fu
Holding on this as we are moving towards defining individual fields for 
specific events in the timeline and only storing those specific events.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@192
PS13, Line 192:   EventsTimelineIterator(const std::vector* labels,
> These would make more sense as const&. Passing nullptr is invalid, and we'r
Holding on this as we are moving towards defining individual fields for 
specific events in the timeline and only storing those specific events.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@206
PS13, Line 206:   iter_t operator*() const;
> This class is serving two purposes that might be better served as two separ
Holding on this as we are moving towards defining individual fields for 
specific events in the timeline and only storing those specific events.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@212
PS13, Line 212:   EventsTimelineIterator cbegin();
> If you called these begin/end, I think you could use this iterator as part
Putting this on hold for now as we are moving towards a defined list of events 
store in individual fields instead of storing the entire events timeline.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@287
PS13, Line 287: }; // struct QueryStateExpanded
> nit: it looks strange to prefix things with C just to say they're const.
Holding on this as we are moving towards defining individual fields for 
specific events in the timeline and only storing those specific events.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc
File be/src/service/query-state-record.cc:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc@182
PS13, Line 182:
->GetInfoString(AdmissionController::PROFILE_INFO_KEY_EXECUTOR_GROUP);
> This and compute_scan_range_assignment are only initialized if summary_prof
Both these fields are being dropped from this table.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc@252
PS13, Line 252:
> nit: you don't need to use cbegin/cend on const objects. See https://en.cpp
Holding on this as we are moving towards defining individual fields for 
specific events in the timeline and only storing those specific events.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc@461
PS13, Line 461:
> Can we remove these when we store it instead?
Done


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/workload-management.cc@615
PS13, Line 615:   || !FLAGS_enable_workload_mgmt){
  : return;
  :   }
  :
> This might not catch query such as:
Good catch.  This code does miss use/show sql statements that start with 
comments.  I can use the existing catalog_op_type to ignore the use/show sqls.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/CMakeLists.txt
File be/src/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/CMakeLists.txt@272
PS13, Line 272: ADD_UNIFIED_BE_LSAN_TEST(string-util-test 
"TruncateDownTest.*:TruncateUpTest.*:CommaSeparatedContainsTest.*:FindUtf8PosForwardTest.*:FindUtf8PosBackwardTest.*:RandomFindUtf8PosTest.*:ToSnakeCaseTest.*:StringStreamPopTest.*")
> typo: StringStreamPopTest
Done


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util-test.cc
File be/src/util/string-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/string-util-test.cc@319
PS13, Line 

[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Jason Fehr (Code Review)
Hello Andrew Sherman, Riza Suminto, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20770

to look at the new patch set (#14).

Change subject: IMPALA-12426: Query History Table
..

IMPALA-12426: Query History Table

Adds the ability for users to specify that Impala will
create and maintain an internal Iceberg table that contains
data about all completed queries. This table is
automatically created at startup by each coordinator if it
does not exist. Then, most completed queries are queued in
memory and flushed to the query history table at a set
interval (either minutes or number of records). Set, use,
and show queries are not written to this table. This commit
leverages the InternalServer class to maintain the query
history table.

Ctest unit tests have been added to assert the various
pieces of code. New custom cluster tests have been added
to assert the query history table is properly populated
with completed queries.

Negative testing consists of attempting sql injection
attacks and syntactically incorrect queries.

Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/query-driver.cc
M be/src/runtime/query-driver.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/CMakeLists.txt
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/internal-server-test.cc
M be/src/service/internal-server.cc
M be/src/service/internal-server.h
A be/src/service/query-state-record.cc
A be/src/service/query-state-record.h
A be/src/service/workload-management.cc
M be/src/util/CMakeLists.txt
M be/src/util/backend-gflag-util.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
A be/src/util/network-util-test.cc
M be/src/util/network-util.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/string-util-test.cc
M be/src/util/string-util.cc
M be/src/util/string-util.h
A be/src/util/ticker.h
M be/src/util/uid-util-test.cc
M be/src/util/uid-util.h
M bin/run_clang_tidy.sh
M common/function-registry/impala_functions.py
M common/thrift/BackendGflags.thrift
M common/thrift/metrics.json
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/util/CatalogBlacklistUtils.java
M tests/beeswax/impala_beeswax.py
M tests/common/custom_cluster_test_suite.py
A tests/custom_cluster/test_query_log.py
A tests/util/assert_time.py
A tests/util/memory.py
A tests/util/retry.py
47 files changed, 3,309 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/20770/14
--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 14
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20916 )

Change subject: IMPALA-12578: Pass owner user of database and table to Ranger 
in GRANT/REVOKE
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15096/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibac5335c65a860963ef0ccd890a926af80585ef3
Gerrit-Change-Number: 20916
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 00:33:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 35: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10210/


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 35
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 30 Jan 2024 00:30:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

2024-01-29 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20916 )

Change subject: IMPALA-12578: Pass owner user of database and table to Ranger 
in GRANT/REVOKE
..


Patch Set 2:

(6 comments)

Hi all, I have addressed Csaba's and Quanlong's comments on patch set 1. Please 
let me know if there is any additional suggestion. Thank you very much for your 
help!

http://gerrit.cloudera.org:8080/#/c/20916/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20916/1//COMMIT_MSG@33
PS1, Line 33: the principal type could be
: a group or a role.
> Will this be supported in future tasks? I.e. a user belongs to the owner gr
Thanks Quanlong!

Taking a look at the definition of the class GrantRevokeRequest at 
https://github.com/apache/ranger/blob/master/agents-common/src/main/java/org/apache/ranger/plugin/util/GrantRevokeRequest.java,
 I found that there is no field that denotes the type of the owner. The only 
field related to the owner of a resource is 'ownerUser' and thus it seems 
Ranger implicitly assumes that the provided owner corresponds to a user.

With the above being said, it looks like Impala would not be able to support 
allowing a user belonging to an owner group to execute GRANT and REVOKE unless 
Ranger adds an additional field in GrantRevokeRequest to distinguish an owner 
user from an owner group.

For easy reference, we provide the class definition in the following.

public class GrantRevokeRequest implements Serializable {
private static final long serialVersionUID = 1L;

private String  grantor;
private Set grantorGroups;
private Map resource;
private Set users;
private Set groups;
private Set roles;
private Set accessTypes;
private ListforwardedAddresses;
private String  remoteIPAddress;
private Boolean delegateAdmin  = Boolean.FALSE;
private Boolean enableAudit= Boolean.TRUE;
private Boolean replaceExistingPermissions = Boolean.FALSE;
private Boolean isRecursive= Boolean.FALSE;
private String  clientIPAddress;
private String  clientType;
private String  requestData;
private String  sessionId;
private String  clusterName;
private String  zoneName;
private String  ownerUser;
...
}


http://gerrit.cloudera.org:8080/#/c/20916/1/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
File fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java:

http://gerrit.cloudera.org:8080/#/c/20916/1/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java@236
PS1, Line 236: analyzeTargetDatabase(analyzer);
 : break;
 :   case URI:
 : Preconditions.checkNotNull(uri_);
 : if (privilegeLevel_ != TPrivilegeLevel.ALL) {
 :   throw new AnalysisException("Only 'ALL' privilege may 
be applied at " +
 :   "URI scope in privilege spec.");
 : }
> nit: We can move codes of the precondition check and the try-catch clause i
Done


http://gerrit.cloudera.org:8080/#/c/20916/1/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/20916/1/tests/authorization/test_ranger.py@1191
PS1, Line 1191: revoke_table_stmt, set_table_owner_group_stmt, 
resource_owner_group,
> +1. This test has ~300 lines. It'd be better to refactor it.
Thanks Csaba and Quanlong! Yeah. Indeed. I will move the code for each case to 
its respective sub-function in the next patch.


http://gerrit.cloudera.org:8080/#/c/20916/1/tests/authorization/test_ranger.py@1198
PS1, Line 1198: admin_client, revoke_column_stmt, 
set_table_owner_group_stmt,
> nit: do we really need this refresh? I thought we only need it when we add/
Thanks Quanlong!

I am actually not sure whether or not after a GRANT statement we really need a 
REFRESH AUTHORIZATION before a subsequent SHOW GRANT statement. I added the 
REFRESH statements because I found that there are several places (but not every 
place) in this test file that have the same pattern (GRANT/REVOKE, REFRESH, 
SHOW GRANT). For instance, in _test_grant_revoke(), we have the following.

  self.execute_query_expect_success(admin_client,
"grant select on database {0} to 
{1} {2}"
.format(unique_database, kw, ident),
user=ADMIN)
  self._refresh_authorization(admin_client, 

[Impala-ASF-CR] IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

2024-01-29 Thread Fang-Yu Rao (Code Review)
Hello Quanlong Huang, Aman Sinha, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20916

to look at the new patch set (#2).

Change subject: IMPALA-12578: Pass owner user of database and table to Ranger 
in GRANT/REVOKE
..

IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

After RANGER-1200, Ranger allows the owner user of a resource to
grant/revoke a privilege to/from a grantee/revokee, which requires the
client of the Ranger server to provide the ownership information in the
requests for granting and revoking accesses.

Before this patch, Impala did not provide its Ranger plug-in with the
owner user of resource in the GRANT and REVOKE statements and thus the
owner user of a resource was not able to grant/revoke a privilege
to/from other principals. This patch passes to the Ranger server the
owner user of resource in the GRANT and REVOKE statements when the
resource is a database, a table, or a column. This way the owner user
does not have to be explicitly granted additional privileges on the
resource to execute the GRANT and REVOKE statements for the
aforementioned resource types.

For user-defined functions, we will deal with this resource type in
IMPALA-12685 in that it depends on IMPALA-11743 where we will have to
make Impala load from Hive MetaStore the owner user of a user-defined
function.

The patch also fixes a potential bug in getOwnerUser() of Db, LocalDb,
Table, and LocalTable. Specifically, before this patch, when
determining the owner user of a database or a table, Impala returned
the owner name without verifying the corresponding principal type is
indeed a user. This was problematic because the principal type could be
a group or a role. In addition, we note that Ranger assumes implicitly
that the provided owner is a user. This could be seen from the
definition of GrantRevokeRequest. Before Ranger adds an additional
field in GrantRevokeRequest to distinguish an owner user from an owner
group, Impala will not be able to support allowing a user in an owner
group to grant or revoke privileges on the resources owned by the owner
group.

Testing:
 - Added an end-to-end test to verify that the owner user of a resource
   is able to execute the GRANT/REVOKE statements without being granted
   additional privileges if the resource is a database, a table, or a
   column.

Change-Id: Ibac5335c65a860963ef0ccd890a926af80585ef3
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationTestBase.java
M tests/authorization/test_ranger.py
10 files changed, 371 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/20916/2
--
To view, visit http://gerrit.cloudera.org:8080/20916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibac5335c65a860963ef0ccd890a926af80585ef3
Gerrit-Change-Number: 20916
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-12578: Pass owner user of database and table to Ranger in GRANT/REVOKE

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20916 )

Change subject: IMPALA-12578: Pass owner user of database and table to Ranger 
in GRANT/REVOKE
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10212/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/20916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibac5335c65a860963ef0ccd890a926af80585ef3
Gerrit-Change-Number: 20916
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Jan 2024 00:09:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Reviewed-on: http://gerrit.cloudera.org:8080/20533
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 379 insertions(+), 66 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 12
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..

IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

When using bin/dump_breakpad_symbols.py to dump symbols for RPM/DEB
packages, the script extracts the packages to a temporary directory
and relies on keeping that directory around until the processing
is finished. The parallel processing added in IMPALA-11511 breaks
the logic that keeps the temporary directory around, so the script
generates errors like:

Found debugging info in 
/tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug
Failed to open ELF file 
'/tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug': No 
such file or directory
Failed to write symbol file.

This turns off parallelism for bin/dump_breakpad_symbols.py when
processing RPM/DEB packages (i.e. -r/--pkg). This also avoids using
a ThreadPool when num_processes <= 1.

Testing:
 - Hand tested with Redhat 7 RPMs

Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Reviewed-on: http://gerrit.cloudera.org:8080/20943
Reviewed-by: Impala Public Jenkins 
Tested-by: Joe McDonnell 
---
M bin/dump_breakpad_symbols.py
1 file changed, 21 insertions(+), 11 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Joe McDonnell: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 3: Verified+1

The test failure is unrelated


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 23:33:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has removed a vote on this change.

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10208/


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 23:28:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20972 )

Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10211/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 23:17:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 23:07:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
..


Patch Set 13:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1256
PS12, Line 1256:   std::unique_ptr admission_heartbeat_thr
> Yes.  This boolean is used to guard against spurious wakeups of the complet
Ack


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1281
PS12, Line 1281:
   :   /// Default query options in the form of TQueryOptions and b
> Done
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1797
PS12, Line 1797:
   :
   :
> Done
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc
File be/src/service/internal-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@117
PS12, Line 117: internal_query_id.hi != 0 && internal_query_id.lo != 0
> Done
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@121
PS12, Line 121: session_id
> The CloseSession function does all the necessary checks on its own.
Ack


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/internal-server.cc
File be/src/service/internal-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/internal-server.cc@117
PS13, Line 117: internal_query_id.hi != 0 && internal_query_id.lo != 0
Use the new function in uid-util.h


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc
File be/src/service/query-state-record.cc:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.cc@216
PS13, Line 216:   if (labels_iter_->find("Ready to start on ", 0) == 0) {
  : it.first = READY_TO_START;
  :   } else if(labels_iter_->find("All ", 0) == 0) {
  : it.first = ALL_EXEC_BACKENDS;
Can you make the corresponding event string in coordinator.cc into constant and 
then use that constant for find argument here?


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc
File be/src/service/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc@95
PS12, Line 95:
> I would prefer to leave this code as-is since it was directly moved from ht
Ack


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@50
PS12, Line 50:
 : DEFINE_bool(enable_workload_mgmt, false,
 : "Specifies if Impala will automatically write completed 
queries in the query log "
 : "table. If this value is set to true and then later removed, 
the query log table "
 : "will remain intact and accessible.");
 :
> I've thought about this some, and I think we should go with a bool flag for
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@63
PS12, Line 63:   return false;
 : });
 :
> Done
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@186
PS12, Line 186: /// table.
  : static std::list> fields_;
  :
> I added comments in impala-server.h for the functions implemented in worklo
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@494
PS12, Line 494:   fields_.push_back(MakeTuple("network_address", 
_clientNetworkAddress));
  :   fields_.push_back(MakeTuple("start_time_utc", 
_queryStartTime, "TIMESTAMP"));
  :   fields_.push_back(MakeTuple("total_time_us", _queryDuration, 
"BIGINT"));
  :   fields_.push_back(MakeTuple("s
> No, field order is not important except to the custom cluster python test (
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@598
PS12, Line 598:
  : void ImpalaServer::ShutdownWorkloadManagement() {
  :   unique_lock l(completed_queries_threadstate_mu_);
  :   if (completed_queries_thread_state_ == RUNNING) {
  : completed_queries_thread_state_ = SHUTDOWN_REQUESTED;
  : completed_queries_cv_.notify_all();
  : completed_queries_shutdown_cv_.wait_for(l,
  : 
std::chrono::seconds(FLAGS_query_log_shutdown_deadline_s),
  : []{ return completed_queries_thread_state_ == SHUTDOWN; 
});
  :   }
> Modifications to the completed queries table will be handled in the future.
Ack


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/util/network-util.h
File 

[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 21:48:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12426: Query History Table

2024-01-29 Thread Jason Fehr (Code Review)
Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )

Change subject: IMPALA-12426: Query History Table
..


Patch Set 13:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt
File be/src/service/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/CMakeLists.txt@46
PS12, Line 46: query-state-re
> I recommend to rename this to query-state-record.cc to not confuse with be/
Good idea.  Renamed.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1256
PS12, Line 1256:   std::unique_ptr admission_heartbeat_thr
> Is this protected by completed_queries_lock_ as well?
Yes.  This boolean is used to guard against spurious wakeups of the 
completed_queries_thread_.  I added a separate comment on the 
completed_queries_ticker_ and completed_queries_ready_ variables.


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1281
PS12, Line 1281:
   :   /// Default query options in the form of TQueryOptions and b
> Mention constant/flag that represent maximum number of attempts in this doc
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1797
PS12, Line 1797:
   :
   :
> Is this also immutable like QueryStateRecord? Please clarify in comment.
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1861
PS12, Line 1861:
> You'd need to std::move the value into place in the constructor.
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.h@1861
PS12, Line 1861:
> Making this move-only is unnecessarily specific. It could take by value and
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3192
PS12, Line 3192:
> nit: redundant "this->" here and below?
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/impala-server.cc@3433
PS12, Line 3433:   return conn_id;
   : }
   : } // namespace impala
> nit: unnecessary whitespace.
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h
File be/src/service/internal-server.h:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.h@96
PS12, Line 96:   ///   `persist_in_db` Optional boolean indicating if the 
query data should be
 :   ///   written to the completed queries 
table afte
> nit: Clarify in the doc that `persist_in_db` is optional and default to Tru
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc
File be/src/service/internal-server.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@117
PS12, Line 117: internal_query_id.hi != 0 && internal_query_id.lo != 0
> Move this check to uid-util.h
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/internal-server.cc@121
PS12, Line 121: session_id
> Should this went through the same check as internal_query_id before closing
The CloseSession function does all the necessary checks on its own.

The reason the query id had to be checked is because query execution could fail 
early enough that a query id was never assigned.


http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h
File be/src/service/query-state-record.h:

http://gerrit.cloudera.org:8080/#/c/20770/13/be/src/service/query-state-record.h@161
PS13, Line 161:   private:
> nit: private should be indented differently than the other lines. Previousl
Done


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc
File be/src/service/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/query-state.cc@95
PS12, Line 95:
> nit: other place use this from debug-util.h:
I would prefer to leave this code as-is since it was directly moved from 
https://github.com/jasonmfehr/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/service/impala-server.cc#L2572


http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/12/be/src/service/workload-management.cc@50
PS12, Line 50:
 : DEFINE_bool(enable_workload_mgmt, false,
 : "Specifies if Impala will automatically write completed 
queries in the query log "
 : "table. If this value is set to true and then later removed, 
the query log table "
 : "will remain intact and accessible.");
 :
> Since it only support iceberg table now, shall this turn to bool flag?
I've thought about this some, and 

[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 34: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10209/


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 34
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 20:39:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 35:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15095/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 35
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 20:24:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#35).

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..

IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

The idea is that when any DDL/DML operation is performed by Impala, it
also syncs the db/table to its latest event ID as per HMS. This way
updates to a db/table's are applied in the same order as they appear in
the Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL/DML operation first, i.e., perform HMS
operation and then sync the db/table in the catalogD's cache to the
latest event in HMS for the corresponding db/table. Currently we fetch
all events greater than the db/table's lastSyncEventId and filter them
and if possible batch them in the events processor to sync only the
current db/table events. Once HIVE-27499 is implemented, we can
directly fetch the events only for the respective db/table and process
them. Currently, there is no efficient way to identify if there are
pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls'to true to enable this
feature.

Performance impact: DDL/DML might need more time to execute due to
fetching and applying other events for corresponding metadata object.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification (IMPALA-12553 has more details about this). We also
don't modify the cache this way for the truncate/insert table operation
if the table is in Iceberg format. We don't modify cache using
above process for 'refresh table'/'invalidate metadata table' commands.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.
3) Added some end-to-end tests in test_sync_to_latest_hms_events.py

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M be/src/catalog/catalog-server.cc
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_sync_to_latest_hms_events.py
A tests/metadata/__init__.py
A tests/metadata/test_common_ddl.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
M tests/metadata/test_event_processing.py
A tests/metadata/test_event_processing_base.py
M tests/metadata/test_recover_partitions.py
M tests/metadata/test_reset_metadata.py
M tests/util/event_processor_utils.py
26 files changed, 1,619 insertions(+), 755 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/35
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 35
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong 

[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 35:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10210/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 35
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 19:59:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 34:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/15094/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 34
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 19:22:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 34:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1401
PS33, Line 1401: trace
> Let's use trace() since this will be logged for each new loaded partition.
Ack


http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3357
PS33, Line 3357:
   :   }
   :   if 
(!isTableBeingReplicated(metaStoreClient.getHiveClient(), hdfsTable) &&
   :   !syncToLatestEventId) {
   : // when table is replicated we let the HMS API handle 
the file deletion logic
   : // otherwise we delete the files.
   : Collection parts = 
FeCatalogUtils
   : .loadAllPartitions(hdfsTable);
   : for (FeFsPartition part : parts) {
   :   FileSystemUtil.deleteAllVisibleFiles(new 
Path(part.getLocation()));
   : }
> It'd be better to keep these unchanged since previously they are outside th
Ack


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@958
PS33, Line 958: result = cls.__execute_query(impalad_client, query, 
query_options, user)
> Let's check success first, i.e. like what we does in execute_query_expect_s
Ack


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@1046
PS33, Line 1046:
> nit: use 'cls' for classmethod
Ack


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@1052
PS33, Line 1052:
> nit: use 'cls' for classmethod
Ack



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 34
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 19:02:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, k.venureddy2...@gmail.com, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#34).

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..

IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

The idea is that when any DDL/DML operation is performed by Impala, it
also syncs the db/table to its latest event ID as per HMS. This way
updates to a db/table's are applied in the same order as they appear in
the Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL/DML operation first, i.e., perform HMS
operation and then sync the db/table in the catalogD's cache to the
latest event in HMS for the corresponding db/table. Currently we fetch
all events greater than the db/table's lastSyncEventId and filter them
and if possible batch them in the events processor to sync only the
current db/table events. Once HIVE-27499 is implemented, we can
directly fetch the events only for the respective db/table and process
them. Currently, there is no efficient way to identify if there are
pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls'to true to enable this
feature.

Performance impact: DDL/DML might need more time to execute due to
fetching and applying other events for corresponding metadata object.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification (IMPALA-12553 has more details about this). We also
don't modify the cache this way for the truncate/insert table operation
if the table is in Iceberg format. We don't modify cache using
above process for 'refresh table'/'invalidate metadata table' commands.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.
3) Added some end-to-end tests in test_sync_to_latest_hms_events.py

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M be/src/catalog/catalog-server.cc
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/EventFactory.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M 
fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
M tests/common/impala_test_suite.py
A tests/custom_cluster/test_sync_to_latest_hms_events.py
A tests/metadata/__init__.py
A tests/metadata/test_common_ddl.py
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
M tests/metadata/test_event_processing.py
A tests/metadata/test_event_processing_base.py
M tests/metadata/test_recover_partitions.py
M tests/metadata/test_reset_metadata.py
M tests/util/event_processor_utils.py
26 files changed, 1,620 insertions(+), 755 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/34
--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 34
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong 

[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 34:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10209/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 34
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 19:02:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread gaurav singh (Code Review)
gaurav singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15093/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10208/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:53:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12765: Balance consecutive partitions better for Iceberg tables

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20973 )

Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15092/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:48:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20972 )

Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15091/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:42:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20915 )

Change subject: IMPALA-12503: Support date data type for predicates for 
external data source table
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15090/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Gerrit-Change-Number: 20915
Gerrit-PatchSet: 10
Gerrit-Owner: gaurav singh 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:39:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Hello Quanlong Huang, Daniel Becker, Zoltan Borok-Nagy, Sai Hemanth Gantasala, 
Csaba Ringhofer, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20533

to look at the new patch set (#11).

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..

IMPALA-12463: Batch non-consecutive table events in the event processor

The current batching of events requires events to be
consecutive. When there are multiple tables being modified,
events can be interleaved such that each batch is very
small. If the batching criteria can be relaxed, the
non-consecutive events could be batched and processed more
efficiently.

This implements batching for non-consecutive events by
keeping state on each table individually. Different tables
can continue to accumulate batchable events independently
unless they hit a condition that cuts the batch. The batching
can ignore some events on unrelated tables, but the same
rules apply about the batching of events on an individual table.
For example, for a particular table, any non-INSERT event
between two INSERT events on that table continues to cut the
batching.

In addition, there are certain cross-table events that need to
cut batches across multiple tables:
 1. Drop database / alter database cuts any batches on tables
in the affected database.
 2. Alter table rename cuts any batches on the source or destination
table.

This emits events in monotonically increasing order by
maintaining the resulting events in a sorted map. All
non-batchable events will be emitted in the original order.
Batchable events are emitted based on the ending Event ID
of the batch. This means that batchable events can move
later in the sequence, but they cannot move earlier.

This is based on the original design by Wenzhe Zhou.

Testing:
 - MetastoreEventsProcessorTest has new tests for interleaved
   events on two tables as well as tests for events that
   cut batches across tables (alter table, drop database,
   alter database).
 - A core job showed no other test failures.

Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 379 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20533/11
--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@31
PS10, Line 31: import java.util.Iterator;
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/20533/10/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@440
PS10, Line 440: Map dbMap = 
pendingTableEventsMap.get(dbName);
  : if (dbMap == null) {
  :   dbMap = new HashMap<>();
  :   pendingTableEventsMap.put(dbName, dbMap);
  : }
> nit: Can be replaced with single 'Map.computeIfAbsent' method call
Done



--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12463: Batch non-consecutive table events in the event processor

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20533 )

Change subject: IMPALA-12463: Batch non-consecutive table events in the event 
processor
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10207/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I849c0306bc46080ee4059854f42d9c217a89b905
Gerrit-Change-Number: 20533
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:34:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12765: Balance consecutive partitions better for Iceberg tables

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20973 )

Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20973/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20973/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@55
PS1, Line 55: private List fileDescs_;
Put comment that this is always ordered.


http://gerrit.cloudera.org:8080/#/c/20973/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@210
PS1, Line 210: List orderedFds = Lists.newArrayList(fileDescs_);
 : Collections.sort(orderedFds);
Now that fileDescs_ is always sorted, is this still needed?



--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:30:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15089/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:28:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10204/


--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:26:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12765: Balance consecutive partitions better for Iceberg tables

2024-01-29 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20973

to look at the new patch set (#2).

Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
..

IMPALA-12765: Balance consecutive partitions better for Iceberg tables

During remote read scheduling Impala does the following:

Non-Iceberg tables
 * The scheduler processes the scan ranges in partition key order
 * The scheduler selects N executors as candidates
 * The scheduler chooses the executor from the candidates based on
  minimum number of assigned bytes
 * So consecutive partitions are more likely to be assigned to
  different executors

Iceberg tables
 * The scheduler processes the scan ranges in random order
 * The scheduler selects N executors as candidates
 * The scheduler chooses the executor from the candidates based on
  minimum number of assigned bytes
 * So consecutive partitions (by partition key order) are assigned
  randomly, i.e. there's a higher chances of clustering

With this patch, IcebergScanNode orders its file descriptors based on
their paths, so we will have a more balanced scheduling for consecutive
partitions. Queries that operate on a range of partitions are quite
common, so it makes sense to optimize this case.

Testing:
 * e2e test

Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M tests/query_test/test_iceberg.py
2 files changed, 50 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20973/2
--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12765: Balance consecutive partitions better for Iceberg tables

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20973 )

Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py@1071
PS1, Line 1071: o
flake8: E501 line too long (92 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py@1081
PS1, Line 1081: \
flake8: W605 invalid escape sequence '\d'


http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py@1081
PS1, Line 1081: \
flake8: W605 invalid escape sequence '\('


http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py@1081
PS1, Line 1081: \
flake8: W605 invalid escape sequence '\d'


http://gerrit.cloudera.org:8080/#/c/20973/1/tests/query_test/test_iceberg.py@1081
PS1, Line 1081: \
flake8: W605 invalid escape sequence '\)'



--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:19:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12765: Balance consecutive partitions better for Iceberg tables

2024-01-29 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20973


Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
..

IMPALA-12765: Balance consecutive partitions better for Iceberg tables

During remote read scheduling Impala does the following:

Non-Iceberg tables
 * The scheduler processes the scan ranges in partition key order
 * The scheduler selects N replicas as candidates
 * The scheduler chooses the executor from the candidates based on
  minimum number of assigned bytes
 * So consecutive partitions are more likely to be assigned to
  different executors

Iceberg tables
 * The scheduler processes the scan ranges in random order
 * The scheduler selects N replicas as candidates
 * The scheduler chooses the executor from the candidates based on
  minimum number of assigned bytes
 * So consecutive partitions (by partition key order) are assigned
  randomly, i.e. there's a higher chances of clustering

With this patch, IcebergScanNode orders its file descriptors based on
their paths, so we will have a more balanced scheduling for consecutive
partitions. Queries that operate on a range of partitions are quite
common, so it makes sense to optimize this case.

Testing:
 * e2e test

Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M tests/query_test/test_iceberg.py
2 files changed, 50 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20973/1
--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20972


Change subject: IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976
..

IMPALA-12767: Upgrade Guava to 32.0.1 due to CVE-2023-2976

This patch upgrade Guava version from 31.1-jre to 32.0.1-jre to address
CVE-2023-2976.

Testing:
- Run and pass full build
  ./buildall.sh -skiptests -notests

Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/20972/1
--
To view, visit http://gerrit.cloudera.org:8080/20972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id932ed32200fba4f24b4fdd108546ac4494fc3e8
Gerrit-Change-Number: 20972
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table

2024-01-29 Thread gaurav singh (Code Review)
gaurav singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20915 )

Change subject: IMPALA-12503: Support date data type for predicates for 
external data source table
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java@36
PS9, Line 36: dateCol)
> nit: rename to dateCol
Done


http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@217
PS9, Line 217: dateCol
> rename to dateCol
Done


http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@219
PS9, Line 219: dateToString =
> Naming convention, rename variable dateToString
Done



--
To view, visit http://gerrit.cloudera.org:8080/20915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Gerrit-Change-Number: 20915
Gerrit-PatchSet: 10
Gerrit-Owner: gaurav singh 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:12:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table

2024-01-29 Thread gaurav singh (Code Review)
Hello Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20915

to look at the new patch set (#10).

Change subject: IMPALA-12503: Support date data type for predicates for 
external data source table
..

IMPALA-12503: Support date data type for predicates
for external data source table

This patch adds support for datatype date as predicates
for external data sources.

Testing:
- Manually tested binary date predicates with operators:
  '=', '>', '<', 'BETWEEN'.
- Modified the existing schema and add test cases for date
  predicate for postgres, mysql and impala databases.
  alltypes_impala and AllTypesCaseSensitiveNames_impala

Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
---
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
M 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
M 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
M 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
M testdata/bin/create-ext-data-source-table.sql
M testdata/bin/load-ext-data-sources.sh
M testdata/bin/setup-mysql-env.sh
M 
testdata/workloads/functional-query/queries/QueryTest/impala-ext-jdbc-tables.test
M testdata/workloads/functional-query/queries/QueryTest/jdbc-data-source.test
M 
testdata/workloads/functional-query/queries/QueryTest/mysql-ext-jdbc-tables.test
11 files changed, 218 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/20915/10
--
To view, visit http://gerrit.cloudera.org:8080/20915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Gerrit-Change-Number: 20915
Gerrit-PatchSet: 10
Gerrit-Owner: gaurav singh 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:15:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20971/1/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20971/1/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285
PS1, Line 285: 
FileSystemUtil.copyFileFromUriToLocal(driverUrl);
> Remove this copy ?
No, this is jar file of jdbc driver, not the jar file of jdbc data source 
library.



--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:14:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread gaurav singh (Code Review)
gaurav singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20971/1/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20971/1/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285
PS1, Line 285: 
FileSystemUtil.copyFileFromUriToLocal(driverUrl);
Remove this copy ?



--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:09:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20943 )

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20943/1/bin/dump_breakpad_symbols.py
File bin/dump_breakpad_symbols.py:

http://gerrit.cloudera.org:8080/#/c/20943/1/bin/dump_breakpad_symbols.py@369
PS1, Line 369: status = 1
> Do we need a break here when something happens?
Good point, let's have the behavior match the parallel case.

Before the parallelism, this was the old behavior. I filed IMPALA-12766 to add 
a flag to get the old behavior that keeps going.



--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 18:01:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

2024-01-29 Thread Joe McDonnell (Code Review)
Hello Yida Wu, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20943

to look at the new patch set (#2).

Change subject: IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages
..

IMPALA-12745: Skip parallel symbol dumping with RPM/DEB packages

When using bin/dump_breakpad_symbols.py to dump symbols for RPM/DEB
packages, the script extracts the packages to a temporary directory
and relies on keeping that directory around until the processing
is finished. The parallel processing added in IMPALA-11511 breaks
the logic that keeps the temporary directory around, so the script
generates errors like:

Found debugging info in 
/tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug
Failed to open ELF file 
'/tmp/tmpqfZ9MZ/usr/lib/debug/usr/lib/impala/sbin-retail/impalad.debug': No 
such file or directory
Failed to write symbol file.

This turns off parallelism for bin/dump_breakpad_symbols.py when
processing RPM/DEB packages (i.e. -r/--pkg). This also avoids using
a ThreadPool when num_processes <= 1.

Testing:
 - Hand tested with Redhat 7 RPMs

Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
---
M bin/dump_breakpad_symbols.py
1 file changed, 21 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/20943/2
--
To view, visit http://gerrit.cloudera.org:8080/20943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2885a9cfb36a4f616b539599e7f744bd23552c3
Gerrit-Change-Number: 20943
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols

2024-01-29 Thread Yida Wu (Code Review)
Yida Wu has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20944 )

Change subject: IMPALA-12125: Support for dumping symbols from RPMs without 
separate symbols
..

IMPALA-12125: Support for dumping symbols from RPMs without separate symbols

Some RPMs contain binaries with debug symbols with no separate
debuginfo package needed. bin/dump_breakpad_symbols.py does not
allow this combination, as it expects a corresponding symbol
package. This adds a --no_symbol_pkg option to dump_breakpad_symbols.py
to turn off the requirement that --pkg be combined with --symbol_pkg.

Testing:
 - Tested with an RPM package with an unstripped impalad binary
 - Tested with the usual RPM + debuginfo RPM combination

Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b
Reviewed-on: http://gerrit.cloudera.org:8080/20944
Reviewed-by: Impala Public Jenkins 
Tested-by: Yida Wu 
---
M bin/dump_breakpad_symbols.py
1 file changed, 22 insertions(+), 13 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Yida Wu: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20944
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b
Gerrit-Change-Number: 20944
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols

2024-01-29 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20944 )

Change subject: IMPALA-12125: Support for dumping symbols from RPMs without 
separate symbols
..


Patch Set 2: Verified+1

Irrelevant failure at 
TestEventProcessingCustomConfigs.test_skipping_older_events


--
To view, visit http://gerrit.cloudera.org:8080/20944
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b
Gerrit-Change-Number: 20944
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 29 Jan 2024 17:32:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12125: Support for dumping symbols from RPMs without separate symbols

2024-01-29 Thread Yida Wu (Code Review)
Yida Wu has removed a vote on this change.

Change subject: IMPALA-12125: Support for dumping symbols from RPMs without 
separate symbols
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/20944
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9589b0ed7855fe49c6989ec3dcc51a9e9c4f476b
Gerrit-Change-Number: 20944
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10205/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 29 Jan 2024 17:10:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20971 )

Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15088/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 29 Jan 2024 17:06:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15087/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:58:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12378: Add jar file of JdbcDataSource library to classpath

2024-01-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20971


Change subject: IMPALA-12378: Add jar file of JdbcDataSource library to 
classpath
..

IMPALA-12378: Add jar file of JdbcDataSource library to classpath

This patch adds the jar file of JdbcDataSource library to classpath.
With the change, We can get the instance of JdbcDataSourceData class
in current ClassLoader and don't need to download the jar file from
the given HDFS location to local file system.

Testing:
 - Passed core test
 - Passed dockerised-tests

Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
---
M CMakeLists.txt
M be/src/exec/external-data-source-executor.cc
M bin/set-classpath.sh
M docker/setup_build_context.py
M 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
M testdata/bin/copy-ext-data-sources.sh
8 files changed, 63 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/20971/1
--
To view, visit http://gerrit.cloudera.org:8080/20971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0daff8db6231f161ec27b45b51d78e21733d9b1f
Gerrit-Change-Number: 20971
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20922 )

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
File fe/src/main/java/org/apache/impala/common/RuntimeEnv.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/main/java/org/apache/impala/common/RuntimeEnv.java@100
PS1, Line 100:
> nit: "tables with their"
Done


http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
File fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/20922/1/fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java@53
PS1, Line 53: / Granular scan limit that will injected into individual ScanNode 
of tables.
:   private static Map<
> Looks like some dim tables are also scaled but not linearly. I'll check TPC
ps2 fully inject all stats based on actual dataset instead of just scaling them.



--
To view, visit http://gerrit.cloudera.org:8080/20922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
Gerrit-Change-Number: 20922
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:36:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12503: Support date data type for predicates for external data source table

2024-01-29 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20915 )

Change subject: IMPALA-12503: Support date data type for predicates for 
external data source table
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java@36
PS9, Line 36: date_col
nit: rename to dateCol


http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@217
PS9, Line 217: date_co
rename to dateCol


http://gerrit.cloudera.org:8080/#/c/20915/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@219
PS9, Line 219: date_to_string
Naming convention, rename variable dateToString



--
To view, visit http://gerrit.cloudera.org:8080/20915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf13cbefaad812a0f78755c5791d82b24a3395e4
Gerrit-Change-Number: 20915
Gerrit-PatchSet: 9
Gerrit-Owner: gaurav singh 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: gaurav singh 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:36:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

2024-01-29 Thread Riza Suminto (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20922

to look at the new patch set (#2).

Change subject: IMPALA-12726: Simulate large-scale query in 
TpcdsCpuCostPlannerTest
..

IMPALA-12726: Simulate large-scale query in TpcdsCpuCostPlannerTest

Querying against large-scale databases is good way for testing Impala.
However, it is impractical to do in a single-node development machine.

Frontend testing does not run the test query in the backend executor and
can benefit from simulated large-scale test cases. This patch attempts
to do it by instrumenting the CatalogD metadata loading code to scale
tpcds_partitioned_parquet_snap by injecting column stats from a 3TB
TPC-DS dataset TpcdsCpuCostPlannerTest.

The large-scale column stats are expressed in stats-3TB.json, taken by
running "SHOW COLUMN STATS" and "DESCRIBE FORMATTED" queries on a 3TB
dataset loaded using impala-tpcds-kit. It is parsed and then
piggy-backed through RuntimeEnv. Code that populates stats
metadata (caller of FeCatalogUtils.getRowCount(),
FeCatalogUtils.getTotalSize(), and FeCatalogUtils.injectColumnStats())
is instrumented to populate stats from RuntimeEnv instead of Metastore.
Scaled-up tables are invalidated before a test run to reload them with
new high-scale stats. This patch also add a scan range limit injection
to force ScanNode over a single file table to act as if it scans a
muti-files table.

tpcds_partitioned_schema_template.sql is modified to match column names
and types from impala-tpcds-kit). After this patch, the test files under
PlannerTest/tpcds_cpu_cost/ have matching query plan shapes with the
actual impala-tpcds-kit queries run against the 3TB TPC-DS
dataset (https://github.com/cloudera/impala-tpcds-kit/tree/master/queries),
except for a few mismatches due to different SQL and hard limit on
number of files.

Below are 16 queries out of 103 that still does not have matching shape
and the reasons.
+-+--+
|  Q  | Reason   |
+-+--+
| 6   | extra limit 1|
| 10a | different num files in customer_demographics |
| 23b | different frequent_ss_items CTE  |
| 22  | extra warehouse table|
| 27  | different predicate for store table  |
| 34  | extra limit 10   |
| 36  | different predicate for store table  |
| 53  | missing avg_quarterly_sales  |
| 66  | different SQL|
| 68  | different predicate for data_dim table   |
| 69  | different num files in customer  |
| 73  | different order by, extra limit 1000 |
| 74  | different num files in customer  |
| 84  | missing customer_demographics table  |
| 96  | missing limit 100|
| 98  | extra limit 1000 |
+-+--+

Testing:
- Scale tables of tpcds_partitioned_parquet_snap in
  TpcdsCpuCostPlannerTest to simulate 3TB TPC-DS. The number of
  executors is raised from 3 to 10, and REPLICA_PREFERENCE=REMOTE to
  ignore data locality.
- Pass core tests.

Change-Id: Iaffddd70c2da8376ca6c40f65606bbac46c34de7
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
A fe/src/main/java/org/apache/impala/catalog/SideloadTableStats.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A fe/src/test/java/org/apache/impala/testutil/StatsJsonParser.java
M testdata/datasets/tpcds_partitioned/tpcds_partitioned_schema_template.sql
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/stats-3TB.json
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 

[Impala-ASF-CR] IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20968 )

Change subject: IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20968/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20968/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@300
PS1, Line 300: select snapshot_id from 
functional_parquet.iceberg_query_metadata.snapshots limit 2;
Add check with low BATCH_SIZE option?
https://impala.apache.org/docs/build/html/topics/impala_batch_size.html



--
To view, visit http://gerrit.cloudera.org:8080/20968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
Gerrit-Change-Number: 20968
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:13:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

2024-01-29 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20968 )

Change subject: IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20968/1/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20968/1/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@198
PS1, Line 198: *eos = true;
Is GetNext() will be called again if *eos is true and last struct_like_row is 
not nullptr?



--
To view, visit http://gerrit.cloudera.org:8080/20968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
Gerrit-Change-Number: 20968
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 29 Jan 2024 16:12:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12763: Union with string struct crashes in ASAN

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20970 )

Change subject: IMPALA-12763: Union with string struct crashes in ASAN
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15086/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id728f1254b74636be594a33313a478b0b77c7ae4
Gerrit-Change-Number: 20970
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 29 Jan 2024 15:50:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12763: Union with string struct crashes in ASAN

2024-01-29 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20970


Change subject: IMPALA-12763: Union with string struct crashes in ASAN
..

IMPALA-12763: Union with string struct crashes in ASAN

In ASAN builds, if we UNION ALL an array containing a struct of a string
with itself, Impala crashes. This is how to reproduce it:

In Hive:
  create table su (arr ARRAY>) stored as parquet;
  insert into su values (array(named_struct("s", "A")));

In Impala:
  select 1, arr from su
union all select 2, arr from su;

The ASAN error message indicates a heap-use-after-free.

Normally, UNIONs of structs are not supported yet (see IMPALA-10752),
but if the struct is inside an array it is allowed now. This was
probably not intentional and it leads to the above error, so this change
disables structs in unions completely, including embedded structs.

Testing:
 - adjusted existing tests
 - added a query that tests that types with embedded structs are not
   allowed in a UNION statement, in mixed-collections-and-structs.test

Change-Id: Id728f1254b74636be594a33313a478b0b77c7ae4
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M 
testdata/workloads/functional-query/queries/QueryTest/mixed-collections-and-structs.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/struct-in-select-list.test
6 files changed, 38 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/20970/1
--
To view, visit http://gerrit.cloudera.org:8080/20970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id728f1254b74636be594a33313a478b0b77c7ae4
Gerrit-Change-Number: 20970
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker 


[Impala-ASF-CR] IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20968 )

Change subject: IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15085/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
Gerrit-Change-Number: 20968
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 29 Jan 2024 14:34:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15084/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 29 Jan 2024 14:25:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15083/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 29 Jan 2024 14:18:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

2024-01-29 Thread Tamas Mate (Code Review)
Tamas Mate has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20968


Change subject: IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause
..

IMPALA-12764: Fix Iceberg metadata table scan's LIMIT clause

Earlier the ReachedLimit() condition was not evaluated during
IcebergMetadataScanNode::GetNext(). This missing condition has been
added and LIMIT is now being evaluated.

Testing:
 - Added E2E test

Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
---
M be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
2 files changed, 13 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/20968/1
--
To view, visit http://gerrit.cloudera.org:8080/20968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iea73716fe475c8b063235d2ae971a4074b8e2a20
Gerrit-Change-Number: 20968
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..

IMPALA-12695: Crash with UNION with complex types

If we unnest an array coming from a UNION ALL, we read invalid memory
and in ASAN builds we crash.

Example:
  with v as (select arr1 from complextypes_arrays
union all select arr1 from complextypes_arrays)
  select am.item from v, v.arr1 am;

The problem seems to be that in the item tuple of the collections, the
item slots are present twice. This is because both the inline view
analyzer and the main analyzer add slots with the same path to the
tuple. This is possible because
  - the target tuple is determined based on the path via
Path.getRootDesc(), so it will be the same both in the inline view
and in the main scope
AND
  - the inline view analyzer and the main one do not share
'slotPathMap_', so the analyzer cannot recognise that a slot for the
path has already been added.

This commit solves the problem by checking the target tuple whether a
slot with the same path already exists in it, and if it does, we reuse
that slot. Note, however, that when Analyzer.registerSlotRef() is called
with 'duplicateIfCollections=true', a separate slot is added for
collections which should not be reused. This commit adds a set,
'duplicateCollectionSlots', in Analyzer.GlobalState to keep track of
such collection slots, and these slots are never reused.

Note that there is another bug, IMPALA-12753, that a predicate on the
collection item in the above query is only enforced on the first child
of the union. Therefore this commit disallows placing a predicate on a
collection item when the unnested collection comes from a union.

Testing:
 - added test queries in nested-array-in-select-list.test,
   nested-map-in-select-list.test, zipping-unnest-in-from-clause.test
   and zipping-unnest-in-select-list.test

Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M 
testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test
M 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test
7 files changed, 373 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/20953/3
--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20953/1/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test:

http://gerrit.cloudera.org:8080/#/c/20953/1/testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test@156
PS1, Line 156: item
> Can you also add tests with unnest() in select list /from clause,e.g in zip
The two queries you mentioned produce the same results on master as on this 
patch and unfortunately I don't have capacity to fix them now. But if we remove 
the outer view from the first query it executes correctly, added it as a test 
in both zipping unnest in select list and from clause.



--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Mon, 29 Jan 2024 13:51:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12695: Crash with UNION with complex types

2024-01-29 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20953 )

Change subject: IMPALA-12695: Crash with UNION with complex types
..

IMPALA-12695: Crash with UNION with complex types

If we unnest an array coming from a UNION ALL, we read invalid memory
and in ASAN builds we crash.

Example:
  with v as (select arr1 from complextypes_arrays
union all select arr1 from complextypes_arrays)
  select am.item from v, v.arr1 am;

The problem seems to be that in the item tuple of the collections, the
item slots are present twice. This is because both the inline view
analyzer and the main analyzer add slots with the same path to the
tuple. This is possible because
  - the target tuple is determined based on the path via
Path.getRootDesc(), so it will be the same both in the inline view
and in the main scope
AND
  - the inline view analyzer and the main one do not share
'slotPathMap_', so the analyzer cannot recognise that a slot for the
path has already been added.

This commit solves the problem by checking the target tuple whether a
slot with the same path already exists in it, and if it does, we reuse
that slot. Note, however, that when Analyzer.registerSlotRef() is called
with 'duplicateIfCollections=true', a separate slot is added for
collections which should not be reused. This commit adds a set,
'duplicateCollectionSlots', in Analyzer.GlobalState to keep track of
such collection slots, and these slots are never reused.

Note that there is another bug, IMPALA-12753, that a predicate on the
collection item in the above query is only enforced on the first child
of the union. Therefore this commit disallows placing a predicate on a
collection item when the unnested collection comes from a union.

Testing:
 - added test queries in nested-array-in-select-list.test,
   nested-map-in-select-list.test, zipping-unnest-in-from-clause.test
   and zipping-unnest-in-select-list.test

Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M 
testdata/workloads/functional-query/queries/QueryTest/nested-array-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-map-in-select-list.test
M 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-from-clause.test
M 
testdata/workloads/functional-query/queries/QueryTest/zipping-unnest-in-select-list.test
7 files changed, 373 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/20953/2
--
To view, visit http://gerrit.cloudera.org:8080/20953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I340adc50e6d7cda6f59dacd7a46b6adc31635d46
Gerrit-Change-Number: 20953
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-29 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20965 )

Change subject: IMPALA-12762: Fix cmake error in package building
..


Patch Set 1: Code-Review+1

(2 comments)

Thanks for fixing this so quickly!

http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh
File bin/jenkins/build-all-flag-combinations.sh:

http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@42
PS1, Line 42: skiptests
Let's change this to "notests" so new changes won't break the build again.


http://gerrit.cloudera.org:8080/#/c/20965/1/bin/jenkins/build-all-flag-combinations.sh@50
PS1, Line 50: skiptests
Let's change this to "notests" as well.



--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 29 Jan 2024 13:07:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 33: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10203/


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 33
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 13:01:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20965 )

Change subject: IMPALA-12762: Fix cmake error in package building
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15082/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 29 Jan 2024 10:38:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12762: Fix cmake error in package building

2024-01-29 Thread Yifan Zhang (Code Review)
Yifan Zhang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20965


Change subject: IMPALA-12762: Fix cmake error in package building
..

IMPALA-12762: Fix cmake error in package building

This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in
be/src/exec/json/CMakeLists.txt, so test targets will not be generated
by the CMake when building Impala with -package and -notests.

Testing:
  - Run './buildall.sh -noclean -notests -package' with no error

Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
---
M be/src/exec/json/CMakeLists.txt
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/20965/1
--
To view, visit http://gerrit.cloudera.org:8080/20965
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Gerrit-Change-Number: 20965
Gerrit-PatchSet: 1
Gerrit-Owner: Yifan Zhang 


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 33:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10203/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 33
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 08:29:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs

2024-01-29 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20367 )

Change subject: IMPALA-10976: Sync db/table to latest HMS event for all DDL/DMLs
..


Patch Set 33:

(5 comments)

> Patch Set 31: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10201/

There are still test failures. Retriggered the job on the latest patch set.

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1401
PS33, Line 1401: debug
Let's use trace() since this will be logged for each new loaded partition. For 
a table with 10K partitions, when it's loaded, we'll log 10K lines for this..

Use parameters so the string won't be constructed when TRACE logging is off, 
i.e.

  LOG.trace("Previous id for the partition object is not set. Partition 
info: {}",
  obj.hdfs_partition);


http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20367/33/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3357
PS33, Line 3357: else {
   :   // when table is replicated we let the HMS API 
handle the file deletion logic
   :   // otherwise we delete the files.
   :   Collection parts = 
FeCatalogUtils
   :   .loadAllPartitions(hdfsTable);
   :   for (FeFsPartition part : parts) {
   : FileSystemUtil.deleteAllVisibleFiles(new 
Path(part.getLocation()));
   :   }
   :   LOG.trace("Time elapsed after deleting files for 
table {}: {} msec",
   :   table.getFullName(), 
sw.elapsed(TimeUnit.MILLISECONDS));
   : }
It'd be better to keep these unchanged since previously they are outside the 
try-block so don't need to acquire a MetaStoreClient.


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@958
PS33, Line 958: result = cls.__execute_query(impalad_client, query, 
query_options, user)
Let's check success first, i.e. like what we does in 
execute_query_expect_success():

  assert result.success


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@1046
PS33, Line 1046: self
nit: use 'cls' for classmethod


http://gerrit.cloudera.org:8080/#/c/20367/33/tests/common/impala_test_suite.py@1052
PS33, Line 1052: self
nit: use 'cls' for classmethod



--
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 33
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 29 Jan 2024 08:28:27 +
Gerrit-HasComments: Yes