[Impala-ASF-CR] IMPALA-10701: Switch to use TByteBuffer from thrift
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17428 ) Change subject: IMPALA-10701: Switch to use TByteBuffer from thrift .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7166/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0c7834253a16e440204264b0462a1590dea2463 Gerrit-Change-Number: 17428 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 20 May 2021 05:47:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10701: Switch to use TByteBuffer from thrift
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17428 ) Change subject: IMPALA-10701: Switch to use TByteBuffer from thrift .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0c7834253a16e440204264b0462a1590dea2463 Gerrit-Change-Number: 17428 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 20 May 2021 05:47:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9155: Add recovery mechanism to admission service
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17332 ) Change subject: IMPALA-9155: Add recovery mechanism to admission service .. Patch Set 1: (3 comments) I read through the code once, saving my nits here http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17332/1//COMMIT_MSG@12 PS1, Line 12: - No RPCs are serviced by a coordinator unless it sends its complete Nit" should this be "No RPCs are serviced from a coordinator until it has sent"? http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto File common/protobuf/admission_control_service.proto: http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto@280 PS1, Line 280: /// those situations. An error response is sent if the coordinator has not yet sent a Move new comment to before the TODO to clarify that it is not part of the TODO http://gerrit.cloudera.org:8080/#/c/17332/1/common/protobuf/admission_control_service.proto@301 PS1, Line 301: /// ReleaseQuery. An error response is sent if the coordinator has not yet sent a Move new comment to before the TODO to clarify that it is not part of the TODO -- To view, visit http://gerrit.cloudera.org:8080/17332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8ad3ef9b9e2496c484833d6326ce914c851e02fd Gerrit-Change-Number: 17332 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 19 May 2021 21:28:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17396 ) Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option .. Patch Set 2: (2 comments) Thanks Wenzhe for the reply. http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237 PS2, Line 237: if (leadOnly && !replica.getRole().equals(Role.LEADER.toString())) continue; > The code tablet.getReplicas() and replica.getRole() are calling Kudu client Done http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test: http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9 PS2, Line 9: 00:SCAN KUDU [functional_kudu.zipcode_incomes] : kudu predicates: id = '860US00601' : mem-estimate=3.75MB mem-reservation=0B thread-reservation=1 : tuple-ids=0 row-size=124B cardinality=1 : in pipelines: 00(GETNEXT) > The role of Kudu replica is not shown in the explain output. The difference Maybe we just annotate the kudu scan for LEADER-only, something like the following? 00:SCAN KUDU [functional_kudu.zipcode_incomes, LEADER-only] kudu predicates: id = '860US00601' mem-estimate=3.75MB mem-reservation=0B thread-reservation=1 tuple-ids=0 row-size=124B cardinality=1 in pipelines: 00(GETNEXT) Knowing the scan target is very useful info for performance tuning. For one thing, restricting to LEADER(s) may cause contention in a heavy concurrency situation. -- To view, visit http://gerrit.cloudera.org:8080/17396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 Gerrit-Change-Number: 17396 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 19 May 2021 18:22:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17349 ) Change subject: IMPALA-8769: [DOCS] Change the shell default .. IMPALA-8769: [DOCS] Change the shell default Changed the shell default from Beeswax to HS2. Also changed the default port number from 21000 to 21050. Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Reviewed-on: http://gerrit.cloudera.org:8080/17349 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M docs/topics/impala_shell_options.xml 1 file changed, 3 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Gerrit-Change-Number: 17349 Gerrit-PatchSet: 3 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17349 ) Change subject: IMPALA-8769: [DOCS] Change the shell default .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Gerrit-Change-Number: 17349 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 17:41:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17396 ) Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option .. Patch Set 2: (2 comments) Thanks Qifan for your comments. http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237 PS2, Line 237: if (leadOnly && !replica.getRole().equals(Role.LEADER.toString())) continue; > nit. If there is only one leader, maybe we can break after the only leader The code tablet.getReplicas() and replica.getRole() are calling Kudu client APIs. Kudu don't provide API to get number of leader for the tablet so we don't know the number of leader until the loop ending. http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test: http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9 PS2, Line 9: 00:SCAN KUDU [functional_kudu.zipcode_incomes] : kudu predicates: id = '860US00601' : mem-estimate=3.75MB mem-reservation=0B thread-reservation=1 : tuple-ids=0 row-size=124B cardinality=1 : in pipelines: 00(GETNEXT) > nit. Just wonder if leader only scan or replica scan can be observed in the The role of Kudu replica is not shown in the explain output. The difference with or without setting the query option is the number of hosts and instances. Add new field in the explain output will affect too many test cases. -- To view, visit http://gerrit.cloudera.org:8080/17396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 Gerrit-Change-Number: 17396 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 19 May 2021 17:40:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17349 ) Change subject: IMPALA-8769: [DOCS] Change the shell default .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Gerrit-Change-Number: 17349 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 17:34:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17349 ) Change subject: IMPALA-8769: [DOCS] Change the shell default .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-docs-submit/372/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Gerrit-Change-Number: 17349 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 17:34:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8769: [DOCS] Change the shell default
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17349 ) Change subject: IMPALA-8769: [DOCS] Change the shell default .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia695a01c28bd6350645394d2bbaded731039189c Gerrit-Change-Number: 17349 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 17:33:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 ) Change subject: IMPALA-10681: Improve inner join cardinality estimates .. Patch Set 5: (9 comments) Thanks Zoltan and Qifan for the review. I addressed some of your comments in the latest patch set. Will do another round a bit later today. http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20 PS3, Line 20: > nit. extra Done http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21 PS3, Line 21: MAX(int_col) and the right input does : not have a group-by, so right NDV = 1 can be used. (b) if it : has a group-b > Just wonder if this case is handled in the code. The group by NDV was available but was not being used because of the MAX expression. I clarified the commit message to indicate that this patch is leveraging those stats while previously it was only looking for corresponding scan slots and ignoring the available NDV. http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38 PS3, Line 38: shape of a plan (e.g whether the join order changed). : - Preliminary tests with a TPC-DS 10 GB scale factor on a single > Just out of curiosity I did a single node perf run on TPCDS with scale 10 Thanks Zoltan for doing the preliminary validation. It looks promising, so I feel comfortable moving ahead. I am in touch with the performance team to run a larger scale benchmark but it may take some time. I took the liberty of citing your perf measurement in the commit message. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447 PS3, Line 447: g joinCard = Math.round((l > I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/ Done http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507 PS3, Line 507: : > nit: Since we return in the 'true' branch, we don't need to nest the statem Done. Thanks for pointing out. I had a different control flow at one point and forgot to change it. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550 PS3, Line 550: return (hasNdvStats(slotDesc) && hasNumRowsStats(slotDesc)); : } : > nit. May shorten to return (hasNdvStats() && hasNumRowsStats()). Yes. Thanks for pointing out. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591 PS3, Line 591:* cardinality estimations. The ndv values are upper bounded > nit: Could you please add a brief comment? Done http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606 PS3, Line 606: s; > nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation. Moved it. When I wrote this I was thinking there may be a setter also for these member variables but at this point I am not seeing a good reason for it. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607 PS3, Line 607: s; > Same as above. Done -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 17:26:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 ) Change subject: IMPALA-10681: Improve inner join cardinality estimates .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8755/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 17:26:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 4: (18 comments) Looks good and I may go over it one more time this week. Overall, I like the idea of decoupling ScanRange and the new class and maybe we can achieve a good level of separation. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@252 PS4, Line 252: need nit. needs http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514 PS4, Line 514: lock_store_ > EnqueueReadyBuffer() and AddUnusedBuffer() acquire the lock at the beginnin +1 on "ensuring no new locks are taken in ScanBufferManager methods. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h File be/src/runtime/io/scan-buffer-manager.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@40 PS4, Line 40: /// management for the respective range nit. missing '.'. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@47 PS4, Line 47: NULL nit. nullptr. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@52 PS4, Line 52: range nit. should be 'scan_range_'. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@53 PS4, Line 53: the scan range nit. this scan range (i.e. scan_range_). http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@62 PS4, Line 62: range nit. scan_range_->len(). http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@106 PS4, Line 106: PopReadyBuffer nit. PopFirstReadyBuffer() describes the semantics better. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@109 PS4, Line 109: ExternalB nit. should be inline. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169 PS4, Line 169: ExternalBufferTag nit: maybe name it BufferTag, since the buffer is managed by this new class and is not external. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc File be/src/runtime/io/scan-buffer-manager.cc: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@28 PS4, Line 28: lock_store_(&(range->lock_store_)), This probably should be moved inside the cstr body, after DCHECK(). http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@135 PS4, Line 135: goto error; nit. can we replace goto with the actual code below? http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@165 PS4, Line 165: if (scan_range_->all_buffers_returned(scan_range_lock) && : num_buffers_in_reader_.Load() == 0) { : // Close the scan range if there are no more buffers in the reader and no more buffers : // will be returned to readers in future. Close() is idempotent so it is ok to call : // multiple times during cleanup so long as the range is actually finished. : if (!scan_range_->use_local_buffer_) { : scan_range_->file_reader_->Close(); : } else { : scan_range_->local_buffer_reader_->Close(); : } It looks like this block of code is scan_range_ specific and probably should be moved back to the ScanRange as a method like CloseReader(). We can call it here. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@199 PS4, Line 199: DCHECK(!ready_buffers_.empty()); Should return false if the buffer is empty as DCHECK() only works in debug build. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@88 PS4, Line 88: external_buffer_tag_ nit. This data member does not exist anymore. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@427 PS4, Line 427: buffer_manager_->external_buffer_tag_ = : ScanBufferManager::ExternalBufferTag::CLIENT_BUFFER; nit. Call buffer_manager_->is_client_buffer() instead? http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@432 PS4, Line 432: buffer_manager_->external_buffer_tag_ = : ScanBufferManager::ExternalBufferTag::NO_BUFFER; Same comment as above. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@553 PS4, Line 553: buffer_manager_->external_buffer_tag_ = : ScanBufferManager::ExternalBufferTag::CACHED_BUFFER; nit. May create a
[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates
Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17387 to look at the new patch set (#5). Change subject: IMPALA-10681: Improve inner join cardinality estimates .. IMPALA-10681: Improve inner join cardinality estimates During cardinality estimation for inner joins, if the join conjunct involves a scan slot on left side and a function (e.g MAX) on the right, currently we determine that the NDV stats of either side is not useful and return the left side's cardinality even though it may be a significant over-estimate. In this patch, we handle join conjuncts of such types by keeping them in an 'other' eligible conjuncts list as long as the NDV for expressions on both sides of the join and the input row count is available. For example, in the following cases the NDV is available but was not being used for inner joins since the previous logic was only looking for scan slots: (a) int_col = MAX(int_col) and the right input does not have a group-by, so right NDV = 1 can be used. (b) if it has a group-by and the group-by columns already have associated NDV, the combined NDV is also available. Other such examples exist. An auxiliary struct is introduced to keep track of the ndv and row count. Once these 'other' eligible conjuncts are populated, we do the join cardinality estimation in a manner similar to the normal join conjuncts by fetching the stats from the auxiliary struct. Testing: - Added new planner tests for inner join cardinality - Modified expected plans for certains tests including TPC-DS queries and ran end-to-end TPC-DS queries - Since TPC-DS plans are complex, I did a check of the cardinality changes for some of the hash joins but not the changes in the shape of a plan (e.g whether the join order changed). - Preliminary tests with a TPC-DS 10 GB scale factor on a single node showed between 5-15% performance improvements for 4 of the 6 queries whose plans changed. Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 15 files changed, 3,691 insertions(+), 3,331 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/5 -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10681: Improve inner join cardinality estimates
Hello Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17387 to look at the new patch set (#4). Change subject: IMPALA-10681: Improve inner join cardinality estimates .. IMPALA-10681: Improve inner join cardinality estimates During cardinality estimation for inner joins, if the join conjunct involves a scan slot on left side and a function (e.g MAX) on the right, currently we determine that the NDV stats of either side is not useful and return the left side's cardinality even though it may be a significant over-estimate. In this patch, we handle join conjuncts of such types by keeping them in an 'other' eligible conjuncts list as long as the NDV for expressions on both sides of the join and the input row count is available. For example, in the following cases the NDV is available but was not being used for inner joins since the previous logic was only looking for scan slots: (a) int_col = MAX(int_col) and the right input does not have a group-by, so right NDV = 1 can be used. (b) if it has a group-by and the group-by columns already have associated NDV, the combined NDV is also available. Other such examples exist. An auxiliary struct is introduced to keep track of the ndv and row count. Once these 'other' eligible conjuncts are populated, we do the join cardinality estimation in a manner similar to the normal join conjuncts by fetching the stats from the auxiliary struct. Testing: - Added new planner tests for inner join cardinality - Modified expected plans for certains tests including TPC-DS queries and ran end-to-end TPC-DS queries - Since TPC-DS plans are complex, I did a check of the cardinality changes for some of the hash joins but not the changes in the shape of a plan (e.g whether the join order changed). - Preliminary tests with a TPC-DS 10 GB scale factor on a single node showed between 8-15% performance improvements for 5 of the 6 queries whose plans changed. Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test 15 files changed, 3,691 insertions(+), 3,331 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17387/4 -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514 PS4, Line 514: lock_store_ > Actually the logic in ScanBufferManager and ScanRange are still coupled and EnqueueReadyBuffer() and AddUnusedBuffer() acquire the lock at the beginning, so it wouldn't make harm to pre-acquire the locks. But CleanUpBuffers() also acquires the lock, and it is invoked by AllocateBuffersForRange() at the error path. Taking a look again AddUnusedBuffer() is also invoked by AllocateBuffersForRange() in the middle of the method. One way to resolve this is to put AllocateBuffersForRange() back to ScanRange and acquiring the lock there, so AddUnusedBuffer() and CleanUpBuffers() and all other member functions in ScanBufferManager can expect a pre-acquired lock. I think I like this idea as it'll probably make the reasoning clearer about locking. -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 16:48:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17398 ) Change subject: IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8754/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d Gerrit-Change-Number: 17398 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 19 May 2021 15:56:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514 PS4, Line 514: lock_store_ > Now this lock protects the members of the scan range, and the members of th Actually the logic in ScanBufferManager and ScanRange are still coupled and both modify each others member variables currently. Additionally there are parts in ScanBufferManager which needs ScanRange lock to be freed before scheduling scan range due to existing ordering between locks of ScanRange and RequestContext. So with this coupling it would be cleaner to have only one lock (which we can keep in ScanRange as suggested instead of moving to new class). However, we can make semantics cleaner by ensuring no new locks are taken in ScanBufferManager methods. ScanRange's lock is pre-acquired (if needed) before invoking the buffer manager's methods. All existing method follows that except EnqueueReadyBuffer and AddUnusedBuffer which I can refactor and ensure it follows that. Would that make things better ? -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 15:46:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: (3 comments) Looks great! http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@493 PS5, Line 493: len - i + 2 When len is super long, c_str may overflow the stack. Maybe std::string or some other buffer class can be used instead. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: s + i (s+i) is repeatedly computed in this loop. May precompute it as pos=s+i in advance. http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@524 PS5, Line 524: *result = PARSE_OVERFLOW; Should we also test std::isnan(val) and set PARSE_UNDERFLOW flag here? -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 15:42:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17398 ) Change subject: IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner .. Patch Set 3: PS3 was a rebase. -- To view, visit http://gerrit.cloudera.org:8080/17398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d Gerrit-Change-Number: 17398 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 19 May 2021 15:37:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner
Hello Tamas Mate, Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17398 to look at the new patch set (#3). Change subject: IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner .. IMPALA-10485: Support Iceberg field-id based column resolution in the ORC scanner Currently the ORC scanner only supports position-based column resolution. This patch adds Iceberg field-id based column resolution which will be the default for Iceberg tables. It is needed to support schema evolution in the future, i.e. ALTER TABLE DROP/RENAME COLUMNS. (The Parquet scanner already supports Iceberg field-id based column resolution) Testing * added e2e test 'iceberg-orc-field-id.test' by copying the contents of nested-types-scanner-basic, nested-types-scanner-array-materialization, nested-types-scanner-position, nested-types-scanner-maps, and executing the queries on an Iceberg table with ORC data files Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d --- M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/exec/parquet/parquet-metadata-utils.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/debug-util.cc M be/src/util/debug-util.h M common/thrift/Query.thrift M testdata/data/README A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/data/0-0-boroknagyz_20210331133358_b718b2ff-9f49-4056-a5ed-0d37ec144fff-job_16171873329050_0002-1.orc A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/data/1-0-boroknagyz_20210331133358_b718b2ff-9f49-4056-a5ed-0d37ec144fff-job_16171873329050_0002-1.orc A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/46b4a907-2ff3-4799-ba4a-074d04734265-m0.avro A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/snap-8747481058330439933-1-46b4a907-2ff3-4799-ba4a-074d04734265.avro A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/v1.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/v2.metadata.json A testdata/data/iceberg_test/hadoop_catalog/ice/complextypestbl_iceberg_orc/metadata/version-hint.text M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/iceberg-orc-field-id.test M tests/query_test/test_iceberg.py 21 files changed, 2,886 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17398/3 -- To view, visit http://gerrit.cloudera.org:8080/17398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia2b1abcc25ad2268aa96dff032328e8951dbfb9d Gerrit-Change-Number: 17398 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 ) Change subject: IMPALA-10681: Improve join cardinality estimates .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@38 PS3, Line 38: TODO: We would want to run a performance test to validate :the plan changes for TPC-DS at a sufficiently high scale factor. Just out of curiosity I did a single node perf run on TPCDS with scale 10 bin/single_node_perf_run.py --scale 10 --iterations 10 --table_formats parquet/snap --workloads tpcds And looked at the changed queries: Q4, Q11: no real change Q5: ~8% improvement Q54: ~5% improvement Q71: ~15% improvement Q74: ~10% improvement But of course, it'd be more interesting to see the results at higher scale on a real cluster. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@1859 PS3, Line 1859: public Expr getSlotDescFirstSourceExpr() { nit: could you please add a brief comment for this method? http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@306 PS3, Line 306: getOtherJoinCardinality nit: could you mention this calculation in the doc comment of this method? http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@447 PS3, Line 447: after IMPALA-7310 is fixed I think this part is fixed by https://gerrit.cloudera.org/#/c/16349/ http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@507 PS3, Line 507: return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, rhsScanSlot); : } else { nit: Since we return in the 'true' branch, we don't need to nest the statements in an 'else' block. I.e. to reduce nesting, we can just have: if (hasLhs && hasRhs) { return new EqJoinConjunctScanSlots(eqJoinConjunct, lhsScanSlot, rhsScanSlot); } Expr lhsExpr = eqJoinConjunct.getChild(0); ... http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@591 PS3, Line 591: public static final class NdvAndRowCountStats { nit: Could you please add a brief comment? -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 15:29:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17396 ) Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option .. Patch Set 2: (2 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/17396/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@237 PS2, Line 237: if (leadOnly && !replica.getRole().equals(Role.LEADER.toString())) continue; nit. If there is only one leader, maybe we can break after the only leader is processed. http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test: http://gerrit.cloudera.org:8080/#/c/17396/2/testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test@9 PS2, Line 9: 00:SCAN KUDU [functional_kudu.zipcode_incomes] : kudu predicates: id = '860US00601' : mem-estimate=3.75MB mem-reservation=0B thread-reservation=1 : tuple-ids=0 row-size=124B cardinality=1 : in pipelines: 00(GETNEXT) nit. Just wonder if leader only scan or replica scan can be observed in the explain output. -- To view, visit http://gerrit.cloudera.org:8080/17396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 Gerrit-Change-Number: 17396 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 19 May 2021 15:22:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 ) Change subject: IMPALA-10681: Improve join cardinality estimates .. Patch Set 3: (9 comments) Hi Aman, Thanks a lot for the work. I think it should help with the cases nicely. http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@20 PS3, Line 20: and nit. extra http://gerrit.cloudera.org:8080/#/c/17387/3//COMMIT_MSG@21 PS3, Line 21: If it has a group-by and the group-by : columns alread have associated NDV, we can can still know the : combined NDV. Just wonder if this case is handled in the code. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@430 PS3, Line 430: getOtherJoinCardinality getNdvAdjustedJoinCardinality() may sound more close in semanticds of this function. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@431 PS3, Line 431: long lhsCard, long rhsCard May add a comment for these two input variables. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@441 PS3, Line 441: if (stats.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / stats.lhsNumRows(); I wonder if we should trust lhsCard more here. On paper, NDV of an aggregate function is always 1, while the row count at the table level is somewhat accurate, and the row count above the scan is much less. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@448 PS3, Line 448: long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, rhsAdjNdv))) * : rhsCard); For equi- inner joins and the max. cardinality is to be returned, it seems an alternative formula could be (CardL / NdvL) * (CardR / NdvR) which is more intuitive to read and takes care of the occurrence frequency for both sides. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@550 PS3, Line 550: if (!hasNdvStats(slotDesc)) return false; : if (!hasNumRowsStats(slotDesc)) return false; : return true; nit. May shorten to return (hasNdvStats() && hasNumRowsStats()). http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@606 PS3, Line 606: return Math.min(lhsNdv_, lhsNumRows_); nit. May adjust the lhsNdv_ in the cstr to avoid repeated computation. http://gerrit.cloudera.org:8080/#/c/17387/3/fe/src/main/java/org/apache/impala/planner/JoinNode.java@607 PS3, Line 607: return Math.min(rhsNdv_, rhsNumRows_); Same as above. -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 15:09:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 5: Code-Review+1 (2 comments) It's great to hear that the library has great perf. LGTM, only had smaller nits. http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17389/5//COMMIT_MSG@12 PS5, Line 12: doesn't : sacrifise the accuracy for speed. Could you please add your measurements here as well? http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h File be/src/util/string-parser.h: http://gerrit.cloudera.org:8080/#/c/17389/5/be/src/util/string-parser.h@501 PS5, Line 501: while(i + 1 < len && *(s + i) == '0' && : fast_double_parser::is_integer(*(s + i + 1))) { : i++; : } : : // Library doesn't handle strings starting with '.' like '.56' or '-.56'. : // Preprocessing such strings by prefixing it with '0' : if (*(s + i) == '.') { : c_len = len - i + 1; : c_str[0] = '0'; // Prefix '0' : memcpy(c_str + 1, s + i, c_len - 1); : c_str[c_len] = '\0'; : } else { : c_len = len - i; : memcpy(c_str, s + i, c_len); : c_str[c_len] = '\0'; : } Maybe we could put this into its own function, stg like 'NormalizeFloatString()' and add a few unit tests for it in string-parser-test.cc. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 14:14:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17427 ) Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. IMPALA-10702: Add warning logs for slow or large catalogd response It'd be helpful to log the slow or large responses of catalogd in debugging scalability issues. This patch adds these warning logs in JniCatalog, where we serialize thrift responses. See some example outputs in the jira description. Responses that have size larger than 50MB or take more than 60s to finish will be logged with the request. Add flags for these two thredshold in case users found the warnings too verbose and want to increase the thresholds. Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Reviewed-on: http://gerrit.cloudera.org:8080/17427 Reviewed-by: Quanlong Huang Tested-by: Impala Public Jenkins --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java 5 files changed, 129 insertions(+), 23 deletions(-) Approvals: Quanlong Huang: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17427 ) Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 19 May 2021 12:36:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 12:17:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17455 ) Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS. .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add Gerrit-Change-Number: 17455 Gerrit-PatchSet: 7 Gerrit-Owner: Yong Yang Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yong Yang Gerrit-Comment-Date: Wed, 19 May 2021 08:16:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17455 ) Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS. .. IMPALA-10695: add dedicated thread pool for OSS/JindoFS. OSS is the object store in Alibaba cloud, just like s3a, and jindofs is a gateway based on Alibaba cloud object store. The following is about the JindoFS, for more information: https://www.alibabacloud.com/blog/introducing-jindofs-a-high-performance-data-lake-storage-solution_595600 If Alibaba object store would be treated as local disk without this change, the query performance is not good. This change would create a dedicated queue for this kind of target, and improved the OSS scan performance. I have tested it in our environment, and observed at least double the scan speed. New flags: - num_oss_io_threads: Number of OSS/JindoFS I/O threads. Defaults to 16. Change-Id: I4643105628f3860e3145c85d9ed205fe20291add Signed-off-by: Yong Yang Reviewed-on: http://gerrit.cloudera.org:8080/17455 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h 5 files changed, 25 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add Gerrit-Change-Number: 17455 Gerrit-PatchSet: 8 Gerrit-Owner: Yong Yang Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yong Yang
[Impala-ASF-CR] IMPALA-10695: add dedicated thread pool for OSS/JindoFS.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17455 ) Change subject: IMPALA-10695: add dedicated thread pool for OSS/JindoFS. .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7162/ -- To view, visit http://gerrit.cloudera.org:8080/17455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4643105628f3860e3145c85d9ed205fe20291add Gerrit-Change-Number: 17455 Gerrit-PatchSet: 6 Gerrit-Owner: Yong Yang Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yong Yang Gerrit-Comment-Date: Wed, 19 May 2021 07:33:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17396 ) Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8753/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 Gerrit-Change-Number: 17396 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 19 May 2021 06:49:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17427 ) Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7165/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 19 May 2021 06:41:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17427 ) Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. Patch Set 5: Code-Review+2 Resolved conflicts due to flags added by other patches. Carry on Vihang's +2. -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 19 May 2021 06:40:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17427 ) Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8752/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 19 May 2021 06:35:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10197: Add KUDU REPLICA SELECTION query option
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17396 ) Change subject: IMPALA-10197: Add KUDU_REPLICA_SELECTION query option .. IMPALA-10197: Add KUDU_REPLICA_SELECTION query option Sometimes it is useful to target queries at the leader only replica instead of the default closest replica. This patch added new query option KUDU_REPLICA_SELECTION with which to choose replicas for Kudu amongst multiple Kudu replicas. Removed variable FLAGS_pick_only_leaders_for_tests since its usage can be replaced by the new query option. Added new planner and end-to-end tests for the new query option. Testings: - Passed exhaustive tests. Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 --- M be/src/exec/kudu-scanner.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/debug-util.cc M be/src/util/debug-util.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-closest-replica.test A testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test M tests/query_test/test_kudu.py 13 files changed, 359 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/17396/2 -- To view, visit http://gerrit.cloudera.org:8080/17396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I613e6d9be8680c05880f7cf962a31aa38931f3d9 Gerrit-Change-Number: 17396 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. Patch Set 4: Code-Review+2 Sure, I re-launched the verify job. Let's see what it says. -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 06:17:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7164/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 19 May 2021 06:17:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10681: Improve join cardinality estimates
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17387 ) Change subject: IMPALA-10681: Improve join cardinality estimates .. Patch Set 3: > Patch Set 1: > > Thanks for the reply Aman. I wonder if we assume uniform distribution of > values, and the RHS's cardinality is less than or equal to the LHS's NDV then > does it matter if there are duplications on the right side? > > E.g. lets assume the followings: > > LHS cardinality is 1000 > LHS NDV is 10 > > RHS cardinality is 5 > RHS NDV is unknown > > If RHS has 5 distinct values, then the selectivity of it is 50%, so the > JOIN's output cardinality should be 500. > If RHS has the same value 5 times, then the selectivity is 10%, but the > multiplication factor is 5x, so the JOIN's output cardinality should be again > 500. I was tied up with some higher priority tasks. I just uploaded a patchset with a reworked logic for the estimation and added tests accordingly. As noted in the commit message, give the plan changes, we should do a performance run. I will try to get that going. With regard to your example, under the assumptions you have stated, yes the inner join's output cardinality estimates would be the same in both cases because the max NDV is 10. In my previous comment I was comparing the inner join and semi join in the presence of duplicates. If there are duplicates in the RHS then in the worst case the inner join cardinality could be N x M whereas for left semi join it cannot exceed N. (N and M are left and right input cardinality). -- To view, visit http://gerrit.cloudera.org:8080/17387 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8aa9d3b8f3c4848b3e9414fe19ad7ad348d12ecc Gerrit-Change-Number: 17387 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 19 May 2021 06:13:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10702: Add warning logs for slow or large catalogd response
Hello Aman Sinha, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17427 to look at the new patch set (#5). Change subject: IMPALA-10702: Add warning logs for slow or large catalogd response .. IMPALA-10702: Add warning logs for slow or large catalogd response It'd be helpful to log the slow or large responses of catalogd in debugging scalability issues. This patch adds these warning logs in JniCatalog, where we serialize thrift responses. See some example outputs in the jira description. Responses that have size larger than 50MB or take more than 60s to finish will be logged with the request. Add flags for these two thredshold in case users found the warnings too verbose and want to increase the thresholds. Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 --- M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java 5 files changed, 129 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/17427/5 -- To view, visit http://gerrit.cloudera.org:8080/17427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icffcfcaad2a718aebf79e2331efb05ca7a9a7671 Gerrit-Change-Number: 17427 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar