[Impala-ASF-CR] IMPALA-10483(part-1): Refactor table mask resolving

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17184 )

Change subject: IMPALA-10483(part-1): Refactor table mask resolving
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG@10
PS1, Line 10: based
nit: 'base' instead of 'based' here and few other places.


http://gerrit.cloudera.org:8080/#/c/17184/1//COMMIT_MSG@23
PS1, Line 23: This patch refactor the table mask resolving logic to not apply 
table
One effect of not applying table mask by default is that now each of the 
clauses (SELECT, FROM, WHERE etc.) needs to explicitly call 
setDoTableMasking(). If there's a clause that is missing this implementation or 
if the programmer did not invoke this method on a clause for some reason, the 
flag will remain false and we could end up not applying the table mask.  But as 
long as we have good test coverage on the queries, this may be ok.


http://gerrit.cloudera.org:8080/#/c/17184/1/fe/src/main/java/org/apache/impala/analysis/WithClause.java
File fe/src/main/java/org/apache/impala/analysis/WithClause.java:

http://gerrit.cloudera.org:8080/#/c/17184/1/fe/src/main/java/org/apache/impala/analysis/WithClause.java@66
PS1, Line 66:   public void setDoTableMasking(boolean doTableMasking) {
I would think this would be an @Override of a corresponding method in base 
class but the base class does not have this method..is that intentional ?



--
To view, visit http://gerrit.cloudera.org:8080/17184
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia191928fb179b0b0632235c1fff4c18647e5802f
Gerrit-Change-Number: 17184
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 16 Mar 2021 06:21:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 16 Mar 2021 03:34:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17075/21//COMMIT_MSG@12
PS21, Line 12: join builders and Parquet scanners
Is it fair to say that the main value proposition of the column  min/max 
statistics is in the join builder ?  The Parquet scanners already have access 
to the row group's min/max stats per column, so it seems to me the stats coming 
from HMS will not add additional value there. But for the HJ builder, yes it 
helps by figuring out whether after the build phase let's say you have the 
range [10, 50] and the min/max stats fetched from HMS are [60, 100] then we can 
quickly say that the runtime min/max filter will exclude all row groups.

But what happens if the stats are out-of-date ? Since these stats are getting 
uses not just for ACID tables but for external tables as well, the stats are 
not bound to the table's valid write id (applicable only for ACID tables). This 
can lead to incorrect overlap calculation. Any thoughts ?


http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@221
PS21, Line 221:  QUERY
For the tests, it would be good to add a 3 table join case as well. My 
understanding is that if you have a fact table F and 2 dimension tables D1, D2 
and suppose F is joined to D1, D2 as follows:
 F. a1 = D1.a1 AND F.a1 = D2.a1  then suppose the first join's  range is  [10, 
50] and the second join's range is [20, 60]  then the filter seen at scan of F  
will be [20, 50].  Are such scenario already tested elsewhere ? If so, feel 
free to point me to those and mark this resolved.



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 16 Mar 2021 03:07:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17172 )

Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, 
local variable 'retry_msg' referenced before assign
..

IMPALA-10367: Impala-shell internal error -
UnboundLocalError, local variable 'retry_msg' referenced before assign

ImpalaHS2Client._open_session() has a 'retry_msg' variable which was
not initialized in the code-path where retry was disabled. If an
exception was hit with retry disabled, a compile time error was
generated.

The fix is to initialize 'retry_msg' in the non retry code-path.

Testing:
- Forced exception in ImpalaHS2Client._open_session() and verified that
proper error message was generated.
- Ran impala-shell e2e and custom cluster tests.

Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Reviewed-on: http://gerrit.cloudera.org:8080/17172
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M shell/impala_client.py
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Gerrit-Change-Number: 17172
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17172 )

Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, 
local variable 'retry_msg' referenced before assign
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Gerrit-Change-Number: 17172
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 16 Mar 2021 01:14:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Patch Set 1:

hadn't noticed Wenzhe posted comments earlier


--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:53:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has removed a vote on this change.

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Removed Code-Review+2 by Bikramjeet Vig 
--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:45:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40
PS1, Line 40: admission_max_retry_time_s
I am wondering if it's good to set maximum number of times for retrying RPC 
call, or set maximum times for retrying RPC call? What's typical time for an 
admissiond ready to accept request after it's restarted?


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385
PS1, Line 1385: assert result.data == ["730"]
Could you add another test case for which sleep more than 
admission_max_retry_time_s before restart admissiond so that retry will fail?



--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:33:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8373/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 15 Mar 2021 22:14:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8372/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 22:09:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369
PS1, Line 1369:
flake8: E203 whitespace before ':'



--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:55:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6971/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:54:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

2021-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17188


Change subject: IMPALA-10577: Add retrying of AdmitQuery
..

IMPALA-10577: Add retrying of AdmitQuery

This patch adds retries of the AdmitQuery rpc by coordinators.
This helps to ensure that if an admissiond goes down and is restarted
or is temporarily unreachable, queries won't fail.

The retries are done with backoff and jitter to avoid overloading the
admissiond in these scenarios.

A new flag, --admission_max_retry_time_s, is added to control how long
queries will continue retrying before giving up.

The AdmitQuery rpc is made idempotent - if a query is submitted with
the same query id as one the admissiond already knows about,
AdmitQuery will return OK without submitting the query to be scheduled
again.

Testing:
- Added a custom cluster test that checks that queries won't fail when
  the admissiond goes down.

Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
---
M be/src/scheduling/admission-control-service.cc
M be/src/scheduling/remote-admission-control-client.cc
M be/src/scheduling/remote-admission-control-client.h
M common/protobuf/admission_control_service.proto
M tests/custom_cluster/test_admission_controller.py
5 files changed, 126 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17188/1
--
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8: Code-Review+2

Looks good


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:51:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc@257
PS7, Line 257: // When an external FE has provided a staging directory we 
use that directly.
 : // We are trusting that the external frontend implementation 
has done appropriate
 : // authorization checks on the external output directory.
> Nit: Update comment to call it an external output directory rather than a s
Done


http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@250
PS7, Line 250: if (writeId_ != -1) hdfsTableSink.setWrite_id(writeId_);
 : TTableSink tTableSink = new 
TTableSink(DescriptorTable.TABLE_SINK_ID,
 :
> Nit: If the line hasn't changed, avoid style-only changes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:50:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#8).

Change subject: IMPALA-10553: External Frontend CTAS support
..

IMPALA-10553: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - Any DDL related operations are assumed to be handled by
  the external frontend
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 120 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/8
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-15 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 7: Code-Review+2

> > (1 comment)
 > >
 > > Apart from the interaction with retry_failed_queries, this looks
 > > good to me.
 > >
 > > If you want to split off the retry_failed_queries piece into a
 > > separate change, I am ok with that. I think this change is ok if
 > > the external frontend guarantees not to pass in retry_failed_queries=true
 > > when using a result sink.
 >
 > I think it would be better to follow up. I've created
 > https://issues.apache.org/jira/browse/IMPALA-10586 to track that
 > work.

Ok, then this looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:39:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 7:

> (1 comment)
 >
 > Apart from the interaction with retry_failed_queries, this looks
 > good to me.
 >
 > If you want to split off the retry_failed_queries piece into a
 > separate change, I am ok with that. I think this change is ok if
 > the external frontend guarantees not to pass in retry_failed_queries=true
 > when using a result sink.

I think it would be better to follow up. I've created 
https://issues.apache.org/jira/browse/IMPALA-10586 to track that work.


--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:37:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10518: Add ImpalaServer interface to retrieve executor membership.

2021-03-15 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17181 )

Change subject: IMPALA-10518: Add ImpalaServer interface to retrieve executor 
membership.
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.h
File be/src/scheduling/cluster-membership-mgr.h:

http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.h@260
PS3, Line 260: class TUpdateExecutorMembershipRequest;
Its standard in Impala to put all forward declarations at the top of the file - 
in this case, immediately after the "namespace impala {" line


http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.cc
File be/src/scheduling/cluster-membership-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/scheduling/cluster-membership-mgr.cc@57
PS3, Line 57: impala::
You can just put this function in the "namespace impala{" block below and 
remove this


http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/17181/3/be/src/service/impala-hs2-server.cc@1230
PS3, Line 1230: PopulateExecutorMembershipRequest(membership_snapshot,
  : return_val.executor_membership);
nit: I think this can fit on a single line

More generally, you've got a few minor formatting errors, could you run this 
through clang-format-diff as described here: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536


http://gerrit.cloudera.org:8080/#/c/17181/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17181/3/common/thrift/ImpalaService.thrift@870
PS3, Line 870:   // Returns the executor membership information
Note that this is only supported for the external fe service


http://gerrit.cloudera.org:8080/#/c/17181/3/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17181/3/tests/hs2/test_hs2.py@750
PS3, Line 750: assert 
get_executor_membership_resp.executor_membership.num_executors == 3
I'm confused as to how we would get the correct value here if the response was 
an error, as checked in the previous line



--
To view, visit http://gerrit.cloudera.org:8080/17181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie89b71f4555c368869ee7b9d6341756c60af12b5
Gerrit-Change-Number: 17181
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:07:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..

IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

Changed the following hive settings to avoid hitting Hive container
limit errors:

hive.tez.container.size: 2048
hive.tez.java.opts: -Xmx1700m

With these settings, testdata load completes without errors on a
32GB host.

Reviewed-by: Fang-Yu Rao 
Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Reviewed-on: http://gerrit.cloudera.org:8080/17061
Tested-by: Impala Public Jenkins 
Reviewed-by: Aman Sinha 
---
M fe/src/test/resources/hive-site.xml.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
2 files changed, 138 insertions(+), 137 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Aman Sinha: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..


Patch Set 12:

Carry previous code review +2


--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 15 Mar 2021 20:55:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 15 Mar 2021 20:54:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 7: Code-Review+2

(2 comments)

This looks good to me, only a couple style-only nits. Thanks for changing the 
naming for this.

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc@257
PS7, Line 257: // When an external FE has provided a staging directory we 
use that directly.
 : // We are trusting that the external frontend implementation 
has done appropriate
 : // authorization checks on the external staging directory.
Nit: Update comment to call it an external output directory rather than a 
staging directory.


http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@250
PS7, Line 250: if (writeId_ != -1) {
 :   hdfsTableSink.setWrite_id(writeId_);
 : }
Nit: If the line hasn't changed, avoid style-only changes.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 19:56:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17172 )

Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, 
local variable 'retry_msg' referenced before assign
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Gerrit-Change-Number: 17172
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Mar 2021 19:29:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17172 )

Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, 
local variable 'retry_msg' referenced before assign
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6970/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Gerrit-Change-Number: 17172
Gerrit-PatchSet: 2
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Mar 2021 19:29:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-15 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 7:

(1 comment)

Apart from the interaction with retry_failed_queries, this looks good to me.

If you want to split off the retry_failed_queries piece into a separate change, 
I am ok with that. I think this change is ok if the external frontend 
guarantees not to pass in retry_failed_queries=true when using a result sink.

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, 
FLAGS_hostname));
> I agree that retry_failed_queries might be problematic and we might want to
The retrying happens at a level above the Coordinator object (at the 
QueryDriver level). Each new retry would have its own ClientRequestState object 
(which has its own Coordinator object). Here is a comment covering some of the 
concepts: 
https://github.com/apache/impala/blob/master/be/src/runtime/query-driver.h#L75

The pieces of code in Coordinator that would trigger a retry are the 
TryQueryRetry() calls in this file.

I believe we did not intend for the retries to apply to DMLs, but I tried it 
out and we do actually retry DMLs right now. So, at the moment, there is 
nothing disabling retries for something with a result sink. I will file a JIRA 
for disabling retry_failed_queries for DMLs. I think that was unintentional.



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 18:49:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 21:

Thinking about min-max filters in the context of outer joins, I would expect 
they are _not_ pushed to the non-null producing side of an outer join (we have 
similar restriction for the runtime bloom filters). Is that handled somewhere 
(maybe in another patch) ?


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 18:40:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 21:

(5 comments)

I haven't analyzed how exactly the overlap of min-max filters works..so will go 
through the design doc and maybe the original commit.  In the meantime, sending 
some initial comments.

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/catalog-op-executor.cc@316
PS19, Line 316: or non-string
  :   // columns.
Update this since min/max stats are not getting computed for Decimal types.  
Also, are we sure that allowing float/double min-max filters is ok ? These are 
not precise data types.


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/filter-context.cc@474
PS19, Line 474: case PrimitiveType::TYPE_TIMESTAMP:
The commit message says the min-max column stats is used for integer, float and 
double so I was not expecting to see timestamp and date types also considered 
here.  Shouldn't they just to the default case and return false ?


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/incr-stats-util.cc
File be/src/exec/incr-stats-util.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/incr-stats-util.cc@260
PS19, Line 260: apache::hive::service::cli::thrift::TColumnValue
It would be good to create a typedef for the hive version of TColumnValue.  
Also, the package name looks different from what I see in the source code here: 
https://github.com/apache/hive/blob/master/service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TColumnValue.java#L7


http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17075/19/be/src/exec/partitioned-hash-join-builder.cc@957
PS19, Line 957: VLOG(3) << "The filter is set to always true.";
nit: Would be useful to print that this was a min-max filter and probably the 
filter id.


http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17075/19/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@10
PS19, Line 10: CREATE TABLE unique_database.lineitem_orderkey_only(l_orderkey 
bigint)
Creating large tables during a test run adds unacceptable latency to the  
testing time.   Can the db and table creation steps not be moved into 
testdata/datasets/tpch/  as a schema template for min-max filters ?



--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 18:14:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables

2021-03-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17081 )

Change subject: IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write 
id for ACID tables
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3145
PS3, Line 3145: addHmsPartitionsInTransaction
Do we need similar changes in the alterTableRecoverPartitions path. What about 
dynamically created partitions?



--
To view, visit http://gerrit.cloudera.org:8080/17081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad247008b7c206db00516326c1447bd00a9b34bd
Gerrit-Change-Number: 17081
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 17:45:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8371/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 17:24:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write id for ACID tables

2021-03-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17081 )

Change subject: IMPALA-10512: ALTER TABLE ADD PARTITION should bump the write 
id for ACID tables
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17081/3/fe/src/main/java/org/apache/impala/catalog/Catalog.java@78
PS3, Line 78: ACID_USER
> I've added an abstract method getAcidUserId() to this class.
Thanks. Looks good to me.



--
To view, visit http://gerrit.cloudera.org:8080/17081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad247008b7c206db00516326c1447bd00a9b34bd
Gerrit-Change-Number: 17081
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 17:20:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG@15
PS4, Line 15: dead
nit, I think using "infinite loop" is a more readable.


http://gerrit.cloudera.org:8080/#/c/17171/4//COMMIT_MSG@18
PS4, Line 18: thrown
nit, throw


http://gerrit.cloudera.org:8080/#/c/17171/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17171/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@847
PS4, Line 847: next = baseIterator_.next();
Does this mean that if the file which is present inside a directory and which 
is removed after the RemoteIterator was created, we will start seeing 
FileNotFoundException which can cause Table loading errors?



--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 15 Mar 2021 17:17:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

This patch adds the functionality to compute the minimal and the maximal
value for a column of type integers, float or double for parquet tables,
and to make use of the new stats to discard min/max filters, in both hash
join builders and Parquet scanners, whose coverage are too close to the
actual range defined by the column min and max.

The computation and dislay of the new column min/max stats are done
for Parquet tables only and can be controlled by two new Boolean query
options (default to false):
  1. compute_column_minmax_stats
  2. show_column_minmax_stats

Usage examples.

  set compute_column_minmax_stats=true;
  compute stats tpcds_parquet.store_sales;

  set show_column_minmax_stats=true;
  show column stats tpcds_parquet.store_sales;

+---+--+-...---+-+-+
| Column| Type |   #Falses | Min | Max |
+---+--+-...---+-+-+
| ss_sold_time_sk   | INT  |   -1  | 28800   | 75599   |
| ss_item_sk| BIGINT   |   -1  | 1   | 18000   |
| ss_customer_sk| INT  |   -1  | 1   | 10  |
| ss_cdemo_sk   | INT  |   -1  | 15  | 1920797 |
| ss_hdemo_sk   | INT  |   -1  | 1   | 7200|
| ss_addr_sk| INT  |   -1  | 1   | 5   |
| ss_store_sk   | INT  |   -1  | 1   | 10  |
| ss_promo_sk   | INT  |   -1  | 1   | 300 |
| ss_ticket_number  | BIGINT   |   -1  | 1   | 24  |
| ss_quantity   | INT  |   -1  | 1   | 100 |
| ss_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_discount_amt   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_tax| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_coupon_amt | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid_inc_tax   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_profit | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sold_date_sk   | INT  |   -1  | 2450816 | 2452642 |
+---+--+-...---+-+-+

Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.

The min-max filters, in C++ class or protobuf form, are augmented to
deal with the always true state better. Once always true is set, the
actual min and max values in the filter are no longer populated.

Testing:
 - Added new compute/show stats tests for integers, float and double
   column data types in compute-stats-column-minmax.test;
 - Added new tests in overlap_min_max_filters.test to demonstrate the
   usefulness of column stats to quickly disable useless filters in
   both hash join builder and Parquet scanner;
 - Added tests in min-max-filter-test.cc to demonstrate method Or(),
   ToProtobuf() and constructor can deal with always true flag well;
 - core tests.

TODO:
 1. Test compute stats for timestamp and date columns;
 2. Enable the feature for Iceberg tables with Parquet data files.

Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apac

[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17187 )

Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory 
config.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8370/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
Gerrit-Change-Number: 17187
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 15 Mar 2021 15:55:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.

2021-03-15 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17187


Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory 
config.
..

IMPALA-10583: Fix bug on unbounded result spooling memory config.

Two bugs happening on result spooling when we set two of its query
options to 0 (unbounded).

The first bug happens if max_spilled_result_spooling_mem =
0 (unbounded). max_unpinned_bytes_ in SpillableRowBatchQueue will be set
to 0, SpillableRowBatchQueue::IsFull() then will always return true, and
the query hang. This patch fix it by setting max_unpinned_bytes_ to
INT64_MAX if max_spilled_result_spooling_mem = 0.

The second bug happens if we set max_result_spooling_mem =
0 (unbounded). PlanRootSink.java will peg maxMemReservationBytes to
always equal to minMemReservationBytes. This patch fixes this by
reverting to the default max_result_spooling_mem (100MB).

Testing:
- Add test_unbounded_result_spooling_mem.
- Pass core tests.

Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/spillable-row-batch-queue.cc
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/query_test/test_result_spooling.py
4 files changed, 43 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17187/1
--
To view, visit http://gerrit.cloudera.org:8080/17187
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e
Gerrit-Change-Number: 17187
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 


[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17183 )

Change subject: IMPALA-10552: Support external frontends supplying timeline for 
profile
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG@59
PS1, Line 59:
Add Kurt as co-author



--
To view, visit http://gerrit.cloudera.org:8080/17183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
Gerrit-Change-Number: 17183
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Mon, 15 Mar 2021 15:20:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8369/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 15 Mar 2021 15:04:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-15 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

This patch adds the functionality to compute the minimal and the maximal
value for a column of type integers, float or double for parquet tables,
and to make use of the new stats to discard min/max filters, in both hash
join builders and Parquet scanners, whose coverage are too close to the
actual range defined by the column min and max.

The computation and dislay of the new column min/max stats are done
for Parquet tables only and can be controlled by two new Boolean query
options (default to false):
  1. compute_column_minmax_stats
  2. show_column_minmax_stats

Usage examples.

  set compute_column_minmax_stats=true;
  compute stats tpcds_parquet.store_sales;

  set show_column_minmax_stats=true;
  show column stats tpcds_parquet.store_sales;

+---+--+-...---+-+-+
| Column| Type |   #Falses | Min | Max |
+---+--+-...---+-+-+
| ss_sold_time_sk   | INT  |   -1  | 28800   | 75599   |
| ss_item_sk| BIGINT   |   -1  | 1   | 18000   |
| ss_customer_sk| INT  |   -1  | 1   | 10  |
| ss_cdemo_sk   | INT  |   -1  | 15  | 1920797 |
| ss_hdemo_sk   | INT  |   -1  | 1   | 7200|
| ss_addr_sk| INT  |   -1  | 1   | 5   |
| ss_store_sk   | INT  |   -1  | 1   | 10  |
| ss_promo_sk   | INT  |   -1  | 1   | 300 |
| ss_ticket_number  | BIGINT   |   -1  | 1   | 24  |
| ss_quantity   | INT  |   -1  | 1   | 100 |
| ss_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_discount_amt   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_tax| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_coupon_amt | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid_inc_tax   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_profit | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sold_date_sk   | INT  |   -1  | 2450816 | 2452642 |
+---+--+-...---+-+-+

Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.

The min-max filters, in C++ class or protobuf form, are augmented to
deal with the always true state better. Once always true is set, the
actual min and max values in the filter are no longer populated.

Testing:
 - Added new compute/show stats tests for integers, float and double
   column data types in compute-stats-column-minmax.test;
 - Added new tests in overlap_min_max_filters.test to demonstrate the
   usefulness of column stats to quickly disable useless filters in
   both hash join builder and Parquet scanner;
 - Added tests in min-max-filter-test.cc to demonstrate method Or(),
   ToProtobuf() and constructor can deal with always true flag well;
 - core tests.

TODO:
 1. Test compute stats for timestamp and date columns;

Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/ja

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8368/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Mar 2021 08:04:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

2021-03-15 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
..

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+---+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+---+
| 5 |
+---+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 166 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/4
--
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9234: Support Ranger row filtering policies

2021-03-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16976 )

Change subject: IMPALA-9234: Support Ranger row filtering policies
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc
Gerrit-Change-Number: 16976
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 15 Mar 2021 07:00:12 +
Gerrit-HasComments: No