[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). To make this more predictable, it increases the maximum number of iterations. This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. While still using a set for detecting duplicate backends, the vector of distinct backends is constructed directly rather than by iterating over the set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Reviewed-on: http://gerrit.cloudera.org:8080/14026 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 85 insertions(+), 42 deletions(-) Approvals: Lars Volker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 04:39:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4243/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 00:57:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4785/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 00:32:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 00:30:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4242/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 00:22:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14026 to look at the new patch set (#8). Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). To make this more predictable, it increases the maximum number of iterations. This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. While still using a set for detecting duplicate backends, the vector of distinct backends is constructed directly rather than by iterating over the set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 85 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/8 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc@823 PS7, Line 823: if (distinct_backends.size() != num_distinct_before) { > std::unordered_set actually returns whether an insertion took place: https: Good point, that's a much better way to do this. Switched this over to use the return value from insert. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 14 Aug 2019 00:16:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 7: (1 comment) Had one comment, otherwise LGTM http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc@823 PS7, Line 823: if (distinct_backends.size() != num_distinct_before) { std::unordered_set actually returns whether an insertion took place: https://en.cppreference.com/w/cpp/container/unordered_set/insert -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 23:58:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 7: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 23:48:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14026 to look at the new patch set (#6). Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). To make this more predictable, it increases the maximum number of iterations. This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. While still using a set for detecting duplicate backends, the vector of distinct backends is constructed directly rather than by iterating over the set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 84 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/6 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 6: Code-Review+1 Carry +1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 23:41:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc@781 PS5, Line 781: set > Can this be an unordered_set ? It doesn't look like we rely on the order an Yes, switched this over to an unordered_set and changed it to call reserve() with num_candidates. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 23:41:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 5: Code-Review+1 (1 comment) I will let Lars finish off the review. http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc@781 PS5, Line 781: set Can this be an unordered_set ? It doesn't look like we rely on the order anymore, right ? -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 23:19:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 10:22:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4230/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 06:55:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4777/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 06:16:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14026 to look at the new patch set (#4). Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). To make this more predictable, it increases the maximum number of iterations. This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. While still using a set for detecting duplicate backends, the vector of distinct backends is constructed directly rather than by iterating over the set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 82 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/4 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@545 PS3, Line 545: int num_remote_executor_candidates = query_options.num_remote_executor_candidates; : if (executor_group.NumExecutors() < num_remote_executor_candidates) { : num_remote_executor_candidates = executor_group.NumExecutors(); : } > nit: use std::min Done http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@802 PS3, Line 802: 8 > It seems clearer if we define this constant as a constant variable. Turned this into a static const http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811 PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) { > My thinking was that when n is small, maintaining one structure (even thoug Changed this to use a set (like before), but to put the IpAddrs directly in the vector rather than getting them from the set. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 13 Aug 2019 06:15:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: (3 comments) Working on a new upload http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@796 PS3, Line 796: P((1/3)^(n-1)) > Is there a typo here ? Not sure what (P(1/3^(n-1)) means ? Definitely a typo http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811 PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) { > This is now O(n^2), right? Is there a bound on num_executors and if so, sho Yes, this is O(n^2). We limit the num_remote_executor_candidates to be at most 16 via the query option setting code. We also limit it to be the number of nodes if that is smaller. The default is 3 and some systems are going to use 2. I doubt we are going to set it higher than 3, so we could cut the maximum allowed value to 8 without any real problem. I haven't benchmarked this. http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811 PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) { > Or we can consider using an unordered_set to track the candidates found so My thinking was that when n is small, maintaining one structure (even though it is O(n^2)) might still be better than maintaining two. It is easy to go back to using the set. I would just put the IpAddrs directly in the vector rather than iterating over the set at the end. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 12 Aug 2019 21:27:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@796 PS3, Line 796: P((1/3)^(n-1)) Is there a typo here ? Not sure what (P(1/3^(n-1)) means ? http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@802 PS3, Line 802: 8 It seems clearer if we define this constant as a constant variable. http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811 PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) { > This is now O(n^2), right? Is there a bound on num_executors and if so, sho Or we can consider using an unordered_set to track the candidates found so far. My understanding is that order matters so we still need to return them in a vector, right ? -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 12 Aug 2019 21:09:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@545 PS3, Line 545: int num_remote_executor_candidates = query_options.num_remote_executor_candidates; : if (executor_group.NumExecutors() < num_remote_executor_candidates) { : num_remote_executor_candidates = executor_group.NumExecutors(); : } nit: use std::min http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811 PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) { This is now O(n^2), right? Is there a bound on num_executors and if so, should we add a DCHECK to make sure it's not large? Have you benchmarked this to see if it changes the runtime significantly? -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 12 Aug 2019 16:15:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sun, 11 Aug 2019 07:59:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4770/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sun, 11 Aug 2019 03:51:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4766/ -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 09 Aug 2019 20:41:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4209/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 09 Aug 2019 17:09:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4766/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 09 Aug 2019 16:28:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Hello Michael Ho, Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14026 to look at the new patch set (#3). Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). To make this more predictable, it increases the maximum number of iterations. This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. In code, this means looping over a vector to detect distinct backends rather than using a set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 75 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/3 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 2: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4747/ -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 07 Aug 2019 05:27:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4166/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 07 Aug 2019 01:46:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4165/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 07 Aug 2019 01:42:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 2: I'm still thinking through the testing, but I thought I'd put this up. -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 07 Aug 2019 01:04:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4747/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 07 Aug 2019 01:04:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc@234 PS1, Line 234: for (int num_candidates = 1; num_candidates <= num_impala_nodes + 2; ++num_candidates) { > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 07 Aug 2019 01:03:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14026 to look at the new patch set (#2). Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. In code, this means looping over a vector to detect distinct backends rather than using a set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 55 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/2 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14026 Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters In the original change for consistent scheduling, if a cluster has fewer nodes than the number of remote executor candidates, then the scheduler falls back to using the old SelectRemoteExecutor(). SelectRemoteExecutor() considers all backends and picks the backend with the least assigned bytes; to break ties, it uses randomness. This means that clusters with fewer backends than num_remote_executor_candidates do not have consistent placement. For the file handle cache (the original user of consistent placement), this is not a major problem. However, for data caching, it can result in slower warm up of the data cache and greater duplication of the same data across different nodes. This changes the algorithm to use consistent placement even for small clusters (num nodes <= num_remote_executor_candidates). This also changes GetRemoteExecutorCandidates() to return the candidates in the order that they were selected. In code, this means looping over a vector to detect distinct backends rather than using a set. Testing: - Modify the scheduler-test backend test to verify that small clusters use consistent scheduling. Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 --- M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc 2 files changed, 54 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/1 -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14026 ) Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc@234 PS1, Line 234: for (int num_candidates = 1; num_candidates <= num_impala_nodes + 2; ++num_candidates) { line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/14026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741 Gerrit-Change-Number: 14026 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 07 Aug 2019 01:01:41 + Gerrit-HasComments: Yes