Hello Michael Ho, Lars Volker, Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12037

to look at the new patch set (#8).

Change subject: IMPALA-7928: Consistent remote read scheduling
......................................................................

IMPALA-7928: Consistent remote read scheduling

Currently, remote reads for a particular file are not scheduled to
a consistent set of nodes. This reduces the efficiency of the HDFS
file handle cache.

This modifies the scheduling of remote reads to limit the number
of executors considered when picking an executor for a remote scan
range. The remote executor candidates are generated by hashing the
filename+offset multiple times and finding the closest nodes in a
hash ring. This is a consistent hash that is designed to limit the
number of files remapped when cluster nodes come and go. The number
of remote executor candidates is controlled by a query option
'num_remote_executor_candidates', which defaults to 3. It is
capped at 16.

Once the remote executor candidates are chosen, the algorithm for
picking a specific replica uses the same algorithm as picking a
local replica. It picks the node with the minimum number of
assigned bytes and uses 'schedule_random_replica' to determine
how to break ties.

This leaves the normal algorithms in place for local files, Kudu,
and HBase. If 'num_remote_executor_candidates' is set to 0, the
existing remote scheduling algorithm is used. The existing
algorithm schedules remote scan ranges on all available executors.

Testing:
 - There is a new hash-ring-test and related tests in scheduler-test.
 - There is a utility (hash-ring-util) in experiments for hand tuning
   the hash ring.

Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
---
M be/src/experiments/CMakeLists.txt
A be/src/experiments/hash-ring-util.cc
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/backend-config.cc
M be/src/scheduling/backend-config.h
A be/src/scheduling/hash-ring-test.cc
A be/src/scheduling/hash-ring.cc
A be/src/scheduling/hash-ring.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
17 files changed, 824 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/12037/8
--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>

Reply via email to