Hello Michael Ho, Lars Volker, Philip Zeyliger, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/12037 to look at the new patch set (#8). Change subject: IMPALA-7928: Consistent remote read scheduling ...................................................................... IMPALA-7928: Consistent remote read scheduling Currently, remote reads for a particular file are not scheduled to a consistent set of nodes. This reduces the efficiency of the HDFS file handle cache. This modifies the scheduling of remote reads to limit the number of executors considered when picking an executor for a remote scan range. The remote executor candidates are generated by hashing the filename+offset multiple times and finding the closest nodes in a hash ring. This is a consistent hash that is designed to limit the number of files remapped when cluster nodes come and go. The number of remote executor candidates is controlled by a query option 'num_remote_executor_candidates', which defaults to 3. It is capped at 16. Once the remote executor candidates are chosen, the algorithm for picking a specific replica uses the same algorithm as picking a local replica. It picks the node with the minimum number of assigned bytes and uses 'schedule_random_replica' to determine how to break ties. This leaves the normal algorithms in place for local files, Kudu, and HBase. If 'num_remote_executor_candidates' is set to 0, the existing remote scheduling algorithm is used. The existing algorithm schedules remote scan ranges on all available executors. Testing: - There is a new hash-ring-test and related tests in scheduler-test. - There is a utility (hash-ring-util) in experiments for hand tuning the hash ring. Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 --- M be/src/experiments/CMakeLists.txt A be/src/experiments/hash-ring-util.cc M be/src/scheduling/CMakeLists.txt M be/src/scheduling/backend-config.cc M be/src/scheduling/backend-config.h A be/src/scheduling/hash-ring-test.cc A be/src/scheduling/hash-ring.cc A be/src/scheduling/hash-ring.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 17 files changed, 824 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/12037/8 -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>