[Impala-ASF-CR] IMPALA-10701: Switch to use TByteBuffer from thrift

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17428 )

Change subject: IMPALA-10701: Switch to use TByteBuffer from thrift
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7166/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0c7834253a16e440204264b0462a1590dea2463
Gerrit-Change-Number: 17428
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 20 May 2021 05:47:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10701: Switch to use TByteBuffer from thrift

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17428 )

Change subject: IMPALA-10701: Switch to use TByteBuffer from thrift
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0c7834253a16e440204264b0462a1590dea2463
Gerrit-Change-Number: 17428
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 20 May 2021 05:47:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9155: Add recovery mechanism to admission service

2021-05-19 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17332 )

Change subject: IMPALA-9155: Add recovery mechanism to admission service
..


Patch Set 1:

(3 comments)

I read through the code once, saving my nits here

http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@12
PS1, Line 12: - No RPCs are serviced by a coordinator unless it sends its 
complete
Nit" should this be "No RPCs are serviced from a coordinator until it has sent"?


http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:

http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto@280
PS1, Line 280:   /// those situations. An error response is sent if the 
coordinator has not yet sent a
Move new comment to before the TODO to clarify that it is not part of the TODO


http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto@301
PS1, Line 301:   /// ReleaseQuery. An error response is sent if the coordinator 
has not yet sent a
Move new comment to before the TODO to clarify that it is not part of the TODO



--
To view, visit http://gerrit.cloudera.org:8080/17332
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd
Gerrit-Change-Number: 17332
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 19 May 2021 21:28:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17396 )

Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option
..


Patch Set 2:

(2 comments)

Thanks Wenzhe for the reply.

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237
PS2, Line 237: if (leadOnly && 
!replica.getRole().equals(Role.LEADER.toString())) continue;
> The code tablet.getReplicas() and replica.getRole() are calling Kudu client
Done


http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test:

http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9
PS2, Line 9:  00:SCAN KUDU [functional_kudu.zipcode_incomes]
   :  kudu predicates: id = '860US00601'
   :  mem-estimate=3.75MB mem-reservation=0B thread-reservation=1
   :  tuple-ids=0 row-size=124B cardinality=1
   :  in pipelines: 00(GETNEXT)
> The role of Kudu replica is not shown in the explain output. The difference
Maybe we just annotate the kudu scan for LEADER-only, something like the 
following?

00:SCAN KUDU [functional_kudu.zipcode_incomes, LEADER-only]
 kudu predicates: id = '860US00601'
 mem-estimate=3.75MB mem-reservation=0B thread-reservation=1
 tuple-ids=0 row-size=124B cardinality=1
 in pipelines: 00(GETNEXT)

Knowing the scan target is very useful info for performance tuning. For one 
thing, restricting to LEADER(s) may cause contention in a heavy concurrency 
situation.



--
To view, visit http://gerrit.cloudera.org:8080/17396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
Gerrit-Change-Number: 17396
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 19 May 2021 18:22:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17349 )

Change subject: IMPALA-8769: [DOCS] Change the shell default
..

IMPALA-8769: [DOCS] Change the shell default

Changed the shell default from Beeswax to HS2.
Also changed the default port number from 21000 to 21050.

Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Reviewed-on: http://gerrit.cloudera.org:8080/17349
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M docs/topics/impala_shell_options.xml
1 file changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Gerrit-Change-Number: 17349
Gerrit-PatchSet: 3
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17349 )

Change subject: IMPALA-8769: [DOCS] Change the shell default
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Gerrit-Change-Number: 17349
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 17:41:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option

2021-05-19 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17396 )

Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option
..


Patch Set 2:

(2 comments)

Thanks Qifan for your comments.

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237
PS2, Line 237: if (leadOnly && 
!replica.getRole().equals(Role.LEADER.toString())) continue;
> nit. If there is only one leader, maybe we can break after the only leader
The code tablet.getReplicas() and replica.getRole() are calling Kudu client 
APIs. Kudu don't provide API to get number of leader for the tablet so we don't 
know the number of leader until the loop ending.


http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test:

http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9
PS2, Line 9:  00:SCAN KUDU [functional_kudu.zipcode_incomes]
   :  kudu predicates: id = '860US00601'
   :  mem-estimate=3.75MB mem-reservation=0B thread-reservation=1
   :  tuple-ids=0 row-size=124B cardinality=1
   :  in pipelines: 00(GETNEXT)
> nit. Just wonder if leader only scan or replica scan can be observed in the
The role of Kudu replica is not shown in the explain output. The difference 
with or without setting the query option is the number of hosts and instances. 
Add new field in the explain output will affect too many test cases.



--
To view, visit http://gerrit.cloudera.org:8080/17396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
Gerrit-Change-Number: 17396
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 19 May 2021 17:40:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17349 )

Change subject: IMPALA-8769: [DOCS] Change the shell default
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Gerrit-Change-Number: 17349
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 17:34:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17349 )

Change subject: IMPALA-8769: [DOCS] Change the shell default
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-docs-submit/372/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Gerrit-Change-Number: 17349
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 17:34:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default

2021-05-19 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17349 )

Change subject: IMPALA-8769: [DOCS] Change the shell default
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c
Gerrit-Change-Number: 17349
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 17:33:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..


Patch Set 5:

(9 comments)

Thanks Zoltan and Qifan for the review. I addressed some of your comments in 
the latest patch set. Will do another round a bit later today.

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20
PS3, Line 20:
> nit. extra
Done


http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21
PS3, Line 21:  MAX(int_col) and the right input does
: not have a group-by, so right NDV = 1 can be used. (b) if it
: has a group-b
> Just wonder if this case is handled in the code.
The group by NDV was available but was not being used because of the MAX 
expression. I clarified the commit message to indicate that this patch is 
leveraging those stats while previously it was only looking for corresponding 
scan slots and ignoring the available NDV.


http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38
PS3, Line 38: shape of a plan (e.g whether the join order changed).
:  - Preliminary tests with a TPC-DS 10 GB scale factor on a single
> Just out of curiosity I did a single node perf run on TPCDS with scale 10
Thanks Zoltan for doing the preliminary validation.  It looks promising, so I 
feel comfortable moving ahead. I am in touch with the performance team to run a 
larger scale benchmark but it may take some time.
I took the liberty of citing your perf measurement in the commit message.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447
PS3, Line 447: g joinCard = Math.round((l
> I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/
Done


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507
PS3, Line 507:
 :
> nit: Since we return in the 'true' branch, we don't need to nest the statem
Done. Thanks for pointing out. I had a different control flow at one point and  
forgot to change it.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550
PS3, Line 550: return (hasNdvStats(slotDesc) && hasNumRowsStats(slotDesc));
 : }
 :
> nit. May shorten to return (hasNdvStats() && hasNumRowsStats()).
Yes. Thanks for pointing out.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591
PS3, Line 591:* cardinality estimations. The ndv values are upper bounded
> nit: Could you please add a brief comment?
Done


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606
PS3, Line 606: s;
> nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation.
Moved it.  When I wrote this I was thinking there may be a setter also for 
these member variables but at this point I am not seeing a  good reason for it.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607
PS3, Line 607: s;
> Same as above.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 17:26:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8755/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 17:26:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 4:

(18 comments)

Looks good and I may go over it one more time this week.

Overall, I like the idea of decoupling ScanRange and the new class and maybe we 
can achieve a good level of separation.

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@252
PS4, Line 252: need
nit. needs


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514
PS4, Line 514: lock_store_
> EnqueueReadyBuffer() and AddUnusedBuffer() acquire the lock at the beginnin
+1 on "ensuring no new locks are taken in ScanBufferManager methods.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h
File be/src/runtime/io/scan-buffer-manager.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@40
PS4, Line 40:   /// management for the respective range
nit. missing '.'.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@47
PS4, Line 47: NULL
nit. nullptr.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@52
PS4, Line 52: range
nit. should be 'scan_range_'.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@53
PS4, Line 53: the scan range
nit. this scan range (i.e. scan_range_).


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@62
PS4, Line 62: range
nit. scan_range_->len().


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@106
PS4, Line 106: PopReadyBuffer
nit. PopFirstReadyBuffer() describes the semantics better.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@109
PS4, Line 109: ExternalB
nit. should be inline.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169
PS4, Line 169: ExternalBufferTag
nit: maybe name it BufferTag, since the buffer is managed by this new class and 
is not external.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc
File be/src/runtime/io/scan-buffer-manager.cc:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@28
PS4, Line 28: lock_store_(&(range->lock_store_)),
This probably should be moved inside the cstr body, after DCHECK().


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@135
PS4, Line 135: goto error;
nit. can we replace goto with the actual code below?


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@165
PS4, Line 165: if (scan_range_->all_buffers_returned(scan_range_lock) &&
 :   num_buffers_in_reader_.Load() == 0) {
 : // Close the scan range if there are no more buffers in the 
reader and no more buffers
 : // will be returned to readers in future. Close() is 
idempotent so it is ok to call
 : // multiple times during cleanup so long as the range is 
actually finished.
 : if (!scan_range_->use_local_buffer_) {
 :   scan_range_->file_reader_->Close();
 : } else {
 :   scan_range_->local_buffer_reader_->Close();
 : }
It looks like this block of code is scan_range_ specific and probably should be 
moved back to the ScanRange as a method like CloseReader(). We can call it here.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@199
PS4, Line 199: DCHECK(!ready_buffers_.empty());
Should return false if the buffer is empty as DCHECK() only works in debug 
build.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@88
PS4, Line 88: external_buffer_tag_
nit. This data member does not exist anymore.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@427
PS4, Line 427: buffer_manager_->external_buffer_tag_ =
 : ScanBufferManager::ExternalBufferTag::CLIENT_BUFFER;
nit. Call buffer_manager_->is_client_buffer() instead?


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@432
PS4, Line 432: buffer_manager_->external_buffer_tag_ =
 : ScanBufferManager::ExternalBufferTag::NO_BUFFER;
Same comment as above.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@553
PS4, Line 553: buffer_manager_->external_buffer_tag_ =
 :   ScanBufferManager::ExternalBufferTag::CACHED_BUFFER;
nit. May create a 

[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-19 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17387

to look at the new patch set (#5).

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..

IMPALA-10681: Improve inner join cardinality estimates

During cardinality estimation for inner joins, if the join
conjunct involves a scan slot on left side and a function
(e.g MAX) on the right, currently we determine that the NDV
stats of either side is not useful and return the left side's
cardinality even though it may be a significant over-estimate.

In this patch, we handle join conjuncts of such types by
keeping them in an 'other' eligible conjuncts list as long as
the NDV for expressions on both sides of the join and the
input row count is available. For example, in the following
cases the NDV is available but was not being used for inner
joins since the previous logic was only looking for scan
slots: (a) int_col = MAX(int_col) and the right input does
not have a group-by, so right NDV = 1 can be used. (b) if it
has a group-by and the group-by columns already have
associated NDV, the combined NDV is also available.
Other such examples exist. An auxiliary struct is introduced
to keep track of the ndv and row count.

Once these 'other' eligible conjuncts are populated, we do the
join cardinality estimation in a manner similar to the normal
join conjuncts by fetching the stats from the auxiliary struct.

Testing:
 - Added new planner tests for inner join cardinality
 - Modified expected plans for certains tests including
   TPC-DS queries and ran end-to-end TPC-DS queries
 - Since TPC-DS plans are complex, I did a check of the cardinality
   changes for some of the hash joins but not the changes in the
   shape of a plan (e.g whether the join order changed).
 - Preliminary tests with a TPC-DS 10 GB scale factor on a single
   node showed between 5-15% performance improvements for 4 of the
   6 queries whose plans changed.

Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
15 files changed, 3,691 insertions(+), 3,331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/5
--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates

2021-05-19 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17387

to look at the new patch set (#4).

Change subject: IMPALA-10681: Improve inner join cardinality estimates
..

IMPALA-10681: Improve inner join cardinality estimates

During cardinality estimation for inner joins, if the join
conjunct involves a scan slot on left side and a function
(e.g MAX) on the right, currently we determine that the NDV
stats of either side is not useful and return the left side's
cardinality even though it may be a significant over-estimate.

In this patch, we handle join conjuncts of such types by
keeping them in an 'other' eligible conjuncts list as long as
the NDV for expressions on both sides of the join and the
input row count is available. For example, in the following
cases the NDV is available but was not being used for inner
joins since the previous logic was only looking for scan
slots: (a) int_col = MAX(int_col) and the right input does
not have a group-by, so right NDV = 1 can be used. (b) if it
has a group-by and the group-by columns already have
associated NDV, the combined NDV is also available.
Other such examples exist. An auxiliary struct is introduced
to keep track of the ndv and row count.

Once these 'other' eligible conjuncts are populated, we do the
join cardinality estimation in a manner similar to the normal
join conjuncts by fetching the stats from the auxiliary struct.

Testing:
 - Added new planner tests for inner join cardinality
 - Modified expected plans for certains tests including
   TPC-DS queries and ran end-to-end TPC-DS queries
 - Since TPC-DS plans are complex, I did a check of the cardinality
   changes for some of the hash joins but not the changes in the
   shape of a plan (e.g whether the join order changed).
 - Preliminary tests with a TPC-DS 10 GB scale factor on a single
   node showed between 8-15% performance improvements for 5 of the
   6 queries whose plans changed.

Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
15 files changed, 3,691 insertions(+), 3,331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/4
--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514
PS4, Line 514: lock_store_
> Actually the logic in ScanBufferManager and ScanRange are still coupled and
EnqueueReadyBuffer() and AddUnusedBuffer() acquire the lock at the beginning, 
so it wouldn't make harm to pre-acquire the locks.

But CleanUpBuffers() also acquires the lock, and it is invoked by 
AllocateBuffersForRange() at the error path. Taking a look again 
AddUnusedBuffer() is also invoked by AllocateBuffersForRange() in the middle of 
the method. One way to resolve this is to put AllocateBuffersForRange() back to 
ScanRange and acquiring the lock there, so AddUnusedBuffer() and 
CleanUpBuffers() and all other member functions in ScanBufferManager can expect 
a pre-acquired lock.

I think I like this idea as it'll probably make the reasoning clearer about 
locking.



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 16:48:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17398 )

Change subject: IMPALA-10485: Support Iceberg field-id based column resolution 
in the ORC scanner
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8754/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d
Gerrit-Change-Number: 17398
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 19 May 2021 15:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-19 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514
PS4, Line 514: lock_store_
> Now this lock protects the members of the scan range, and the members of th
Actually the logic in ScanBufferManager and ScanRange are still coupled and 
both modify each others member variables currently. Additionally there are 
parts in ScanBufferManager which needs ScanRange lock to be freed before 
scheduling scan range due to existing ordering between locks of ScanRange and 
RequestContext. So with this coupling it would be cleaner to have only one lock 
(which we can keep in ScanRange as suggested instead of moving to new class). 
However, we can make semantics cleaner by ensuring no new locks are taken in 
ScanBufferManager methods. ScanRange's lock is pre-acquired (if needed) before 
invoking the buffer manager's methods. All existing method follows that except 
EnqueueReadyBuffer and AddUnusedBuffer which I can refactor and ensure it 
follows that. Would that make things better ?



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 15:46:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5:

(3 comments)

Looks great!

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493
PS5, Line 493: len - i + 2
When len is super long, c_str may overflow the stack. Maybe std::string or some 
other buffer class can be used instead.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501: s + i
(s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in 
advance.


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524
PS5, Line 524: *result = PARSE_OVERFLOW;
Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here?



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 15:42:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17398 )

Change subject: IMPALA-10485: Support Iceberg field-id based column resolution 
in the ORC scanner
..


Patch Set 3:

PS3 was a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/17398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d
Gerrit-Change-Number: 17398
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 19 May 2021 15:37:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Hello Tamas Mate, Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17398

to look at the new patch set (#3).

Change subject: IMPALA-10485: Support Iceberg field-id based column resolution 
in the ORC scanner
..

IMPALA-10485: Support Iceberg field-id based column resolution in the ORC 
scanner

Currently the ORC scanner only supports position-based column
resolution. This patch adds Iceberg field-id based column resolution
which will be the default for Iceberg tables. It is needed to support
schema evolution in the future, i.e. ALTER TABLE DROP/RENAME COLUMNS.
(The Parquet scanner already supports Iceberg field-id based column
resolution)

Testing
 * added e2e test 'iceberg-orc-field-id.test' by copying the contents of
   nested-types-scanner-basic,
   nested-types-scanner-array-materialization,
   nested-types-scanner-position,
   nested-types-scanner-maps,
   and executing the queries on an Iceberg table with ORC data files

Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M common/thrift/Query.thrift
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/data/0-0-boroknagyz_20210331133358_b718b2ff-9f49-4056-a5ed-0d37ec144fff-job_16171873329050_0002-1.orc
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/data/1-0-boroknagyz_20210331133358_b718b2ff-9f49-4056-a5ed-0d37ec144fff-job_16171873329050_0002-1.orc
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/46b4a907-2ff3-4799-ba4a-074d04734265-m0.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/snap-8747481058330439933-1-46b4a907-2ff3-4799-ba4a-074d04734265.avro
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/v1.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/v2.metadata.json
A 
testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/version-hint.text
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-orc-field-id.test
M tests/query_test/test_iceberg.py
21 files changed, 2,886 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17398/3
--
To view, visit http://gerrit.cloudera.org:8080/17398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d
Gerrit-Change-Number: 17398
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve join cardinality estimates
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38
PS3, Line 38: TODO: We would want to run a performance test to validate
:the plan changes for TPC-DS at a sufficiently high scale 
factor.
Just out of curiosity I did a single node perf run on TPCDS with scale 10

 bin/single_node_perf_run.py --scale 10 --iterations 10 --table_formats 
parquet/snap --workloads tpcds 

And looked at the changed queries:

Q4, Q11: no real change

Q5: ~8% improvement
Q54: ~5% improvement
Q71: ~15% improvement
Q74: ~10% improvement

But of course, it'd be more interesting to see the results at higher scale on a 
real cluster.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@1859
PS3, Line 1859:   public Expr getSlotDescFirstSourceExpr() {
nit: could you please add a brief comment for this method?


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@306
PS3, Line 306: getOtherJoinCardinality
nit: could you mention this calculation in the doc comment of this method?


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447
PS3, Line 447: after IMPALA-7310 is fixed
I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507
PS3, Line 507: return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, 
rhsScanSlot);
 :   } else {
nit: Since we return in the 'true' branch, we don't need to nest the statements 
in an 'else' block. I.e. to reduce nesting, we can just have:

  if (hasLhs && hasRhs) {
return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, 
rhsScanSlot);
  }
  Expr lhsExpr = eqJoinConjunct.getChild(0);
  ...


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591
PS3, Line 591:   public static final class NdvAndRowCountStats {
nit: Could you please add a brief comment?



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 15:29:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17396 )

Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option
..


Patch Set 2:

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237
PS2, Line 237: if (leadOnly && 
!replica.getRole().equals(Role.LEADER.toString())) continue;
nit. If there is only one leader, maybe we can break after the only leader is 
processed.


http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test:

http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9
PS2, Line 9:  00:SCAN KUDU [functional_kudu.zipcode_incomes]
   :  kudu predicates: id = '860US00601'
   :  mem-estimate=3.75MB mem-reservation=0B thread-reservation=1
   :  tuple-ids=0 row-size=124B cardinality=1
   :  in pipelines: 00(GETNEXT)
nit. Just wonder if leader only scan or replica scan can be observed in the 
explain output.



--
To view, visit http://gerrit.cloudera.org:8080/17396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
Gerrit-Change-Number: 17396
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 19 May 2021 15:22:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates

2021-05-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve join cardinality estimates
..


Patch Set 3:

(9 comments)

Hi Aman, Thanks a lot for the work. I think it should help with the cases 
nicely.

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20
PS3, Line 20: and
nit. extra


http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21
PS3, Line 21: If it has a group-by and the group-by
: columns alread have associated NDV, we can can still know the
: combined NDV.
Just wonder if this case is handled in the code.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@430
PS3, Line 430: getOtherJoinCardinality
getNdvAdjustedJoinCardinality() may sound more close in semanticds of this 
function.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@431
PS3, Line 431: long lhsCard, long rhsCard
May add a comment for these two input variables.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@441
PS3, Line 441: if (stats.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
stats.lhsNumRows();
I wonder if we should trust lhsCard more here. On paper, NDV of an aggregate 
function is always 1, while the row count at the table level is somewhat 
accurate, and the row count above the scan is much less.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@448
PS3, Line 448: long joinCard = Math.round((lhsCard / Math.max(1, 
Math.max(lhsAdjNdv, rhsAdjNdv))) *
 :   rhsCard);
For equi- inner joins and the max. cardinality is to be returned, it seems an 
alternative formula could be

(CardL / NdvL) * (CardR / NdvR)

which is more intuitive to read and takes care of the occurrence frequency for 
both sides.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550
PS3, Line 550: if (!hasNdvStats(slotDesc)) return false;
 :   if (!hasNumRowsStats(slotDesc)) return false;
 :   return true;
nit. May shorten to return (hasNdvStats() && hasNumRowsStats()).


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606
PS3, Line 606: return Math.min(lhsNdv_, lhsNumRows_);
nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation.


http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607
PS3, Line 607: return Math.min(rhsNdv_, rhsNumRows_);
Same as above.



--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 15:09:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 5: Code-Review+1

(2 comments)

It's great to hear that the library has great perf. LGTM, only had smaller nits.

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12
PS5, Line 12: doesn't
: sacrifise the accuracy for speed.
Could you please add your measurements here as well?


http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501
PS5, Line 501: while(i + 1 < len && *(s + i) == '0' &&
 : fast_double_parser::is_integer(*(s + i + 1))) {
 :   i++;
 : }
 :
 : // Library doesn't handle strings starting with '.' like 
'.56' or '-.56'.
 : // Preprocessing such strings by prefixing it with '0'
 : if (*(s + i) == '.') {
 :   c_len = len - i + 1;
 :   c_str[0] = '0'; // Prefix '0'
 :   memcpy(c_str + 1, s + i, c_len - 1);
 :   c_str[c_len] = '\0';
 : } else {
 :   c_len = len - i;
 :   memcpy(c_str, s + i, c_len);
 :   c_str[c_len] = '\0';
 : }
Maybe we could put this into its own function, stg like 
'NormalizeFloatString()' and add a few unit tests for it in 
string-parser-test.cc.



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 14:14:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17427 )

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..

IMPALA-10702: Add warning logs for slow or large catalogd response

It'd be helpful to log the slow or large responses of catalogd in
debugging scalability issues. This patch adds these warning logs in
JniCatalog, where we serialize thrift responses. See some example
outputs in the jira description.

Responses that have size larger than 50MB or take more than 60s to
finish will be logged with the request. Add flags for these two
thredshold in case users found the warnings too verbose and want to
increase the thresholds.

Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Reviewed-on: http://gerrit.cloudera.org:8080/17427
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
5 files changed, 129 insertions(+), 23 deletions(-)

Approvals:
  Quanlong Huang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17427 )

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 19 May 2021 12:36:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 12:17:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17455 )

Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add
Gerrit-Change-Number: 17455
Gerrit-PatchSet: 7
Gerrit-Owner: Yong Yang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yong Yang 
Gerrit-Comment-Date: Wed, 19 May 2021 08:16:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17455 )

Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
..

IMPALA-10695: add dedicated thread pool for OSS/JindoFS.

OSS is the object store in Alibaba cloud, just like s3a,
and jindofs is a gateway based on Alibaba cloud object store.
The following is about the JindoFS, for more information:
https://www.alibabacloud.com/blog/introducing-jindofs-a-high-performance-data-lake-storage-solution_595600
If Alibaba object store would be treated as local disk
 without this change, the query performance is not good.
This change would create a dedicated queue for this kind of target,
 and improved the OSS scan performance.
I have tested it in our environment,
 and observed at least double the scan speed.

New flags:
 - num_oss_io_threads: Number of OSS/JindoFS I/O threads. Defaults to 16.

Change-Id: I4643105628f3860e3145c85d9ed205fe20291add
Signed-off-by: Yong Yang 
Reviewed-on: http://gerrit.cloudera.org:8080/17455
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
5 files changed, 25 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add
Gerrit-Change-Number: 17455
Gerrit-PatchSet: 8
Gerrit-Owner: Yong Yang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yong Yang 


[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17455 )

Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7162/


--
To view, visit http://gerrit.cloudera.org:8080/17455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add
Gerrit-Change-Number: 17455
Gerrit-PatchSet: 6
Gerrit-Owner: Yong Yang 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yong Yang 
Gerrit-Comment-Date: Wed, 19 May 2021 07:33:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17396 )

Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8753/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
Gerrit-Change-Number: 17396
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 19 May 2021 06:49:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17427 )

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7165/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 19 May 2021 06:41:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17427 )

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..


Patch Set 5: Code-Review+2

Resolved conflicts due to flags added by other patches. Carry on Vihang's +2.


--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 19 May 2021 06:40:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17427 )

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8752/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 19 May 2021 06:35:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option

2021-05-19 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17396 )

Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option
..

IMPALA-10197: Add KUDU_REPLICA_SELECTION query option

Sometimes it is useful to target queries at the leader only replica
instead of the default closest replica.
This patch added new query option KUDU_REPLICA_SELECTION with which to
choose replicas for Kudu amongst multiple Kudu replicas.
Removed variable FLAGS_pick_only_leaders_for_tests since its usage can
be replaced by the new query option.
Added new planner and end-to-end tests for the new query option.

Testings:
  - Passed exhaustive tests.

Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
---
M be/src/exec/kudu-scanner.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-closest-replica.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test
M tests/query_test/test_kudu.py
13 files changed, 359 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/17396/2
--
To view, visit http://gerrit.cloudera.org:8080/17396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9
Gerrit-Change-Number: 17396
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-19 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..


Patch Set 4: Code-Review+2

Sure, I re-launched the verify job. Let's see what it says.


--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 06:17:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7164/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 May 2021 06:17:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates

2021-05-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17387 )

Change subject: IMPALA-10681: Improve join cardinality estimates
..


Patch Set 3:

> Patch Set 1:
>
> Thanks for the reply Aman. I wonder if we assume uniform distribution of 
> values, and the RHS's cardinality is less than or equal to the LHS's NDV then 
> does it matter if there are duplications on the right side?
>
> E.g. lets assume the followings:
>
>  LHS cardinality is 1000
>  LHS NDV is 10
>
>  RHS cardinality is 5
>  RHS NDV is unknown
>
> If RHS has 5 distinct values, then the selectivity of it is 50%, so the 
> JOIN's output cardinality should be 500.
> If RHS has the same value 5 times, then the selectivity is 10%, but the 
> multiplication factor is 5x, so the JOIN's output cardinality should be again 
> 500.

I was tied up with some higher priority tasks. I just uploaded a patchset with 
a reworked logic for the estimation and added tests accordingly. As noted in 
the commit message, give the plan changes, we should do a performance run. I 
will try to get that going.

With regard to your example, under the assumptions you have stated, yes the 
inner join's output cardinality estimates would be the same in both cases 
because the max NDV is 10. In my previous comment I was comparing the inner 
join and semi join in the presence of duplicates. If there are duplicates in 
the RHS then in the worst case the inner join cardinality could be N x M 
whereas for left semi join it cannot exceed N. (N and M are left and right 
input cardinality).


--
To view, visit http://gerrit.cloudera.org:8080/17387
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc
Gerrit-Change-Number: 17387
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 May 2021 06:13:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response

2021-05-19 Thread Quanlong Huang (Code Review)
Hello Aman Sinha, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17427

to look at the new patch set (#5).

Change subject: IMPALA-10702: Add warning logs for slow or large catalogd 
response
..

IMPALA-10702: Add warning logs for slow or large catalogd response

It'd be helpful to log the slow or large responses of catalogd in
debugging scalability issues. This patch adds these warning logs in
JniCatalog, where we serialize thrift responses. See some example
outputs in the jira description.

Responses that have size larger than 50MB or take more than 60s to
finish will be logged with the request. Add flags for these two
thredshold in case users found the warnings too verbose and want to
increase the thresholds.

Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
5 files changed, 129 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/17427/5
--
To view, visit http://gerrit.cloudera.org:8080/17427
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671
Gerrit-Change-Number: 17427
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar