Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12037 )

Change subject: IMPALA-7928: Consistent remote read scheduling
......................................................................


Patch Set 3:

(11 comments)

thanks! I think the consistent hashing makes sense.

Substantively, we should see if it makes sense to tie the life cycle of the 
hash-ring to when we add/remove nodes, because that's the only time it changes.

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc
File be/src/experiments/hash-ring-test.cc:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@48
PS3, Line 48: IpAddr HostIdxToIpAddr(int host_idx) {
Can you add a comment about that this does with host_idx 0 or 1?


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@101
PS3, Line 101: int main(int argc, char ** argv) {
This is in experiments/ rather than an actual test. Is something actually 
testing this code in a unit test fashion? Should we check that the expected 
numbers here are ok?


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h
File be/src/scheduling/hash-ring.h:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@41
PS3, Line 41: /// When nodes are added or removed, only the hash space in the 
immediate vacinity of the
vicinity


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@72
PS3, Line 72:   void GetDistributionMap(std::map<IpAddr, uint32_t>& 
distribution_map) {
This is public only for tests, right? Make it private and use a friend class?


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@115
PS3, Line 115:   // collisions and is arbitrary about how it resolves them.
Is this consistent, though? i.e., do we always get the same schedule for the 
same inputs?

Actually, I read:

The order of the key-value pairs whose keys compare equivalent is the order of 
insertion and does not change. (since C++11)

at https://en.cppreference.com/w/cpp/container/multimap


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@876
PS3, Line 876:   executor_ips_hashring_ = make_unique<HashRing>(&executor_ips, 
NUM_HASHRING_REPLICAS);
I'm not certain, but it seems like executor_ips_hashring_ needs to only be 
computed when "BackendConfig executor_config" changes, and it looks like it 
changes infrequently (because cluster membership changes infrequently). The 
work here is probably "not that much", but, nonetheless, it seems worthwhile to 
cache it.

It looks like UpdateMembership() in this class matches the lifecycle of when we 
want to do this.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@919
PS3, Line 919:   // Two different hashes of the filename can result in the same 
executor.
             :   // The function should return distinct executors, so it may 
need to do more hashes
             :   // than 'num_remote_replicas'.
We have some flexibility here. I think we're safer to limit ourselves to a 
limited number of hashes. There's no real harm if a block only has 1 replica.

I think you've taken care of the degenerate case (the number of replicas is 
more than executors), but we could add at least a DCHECK to avoid infinite 
iterations. (Or simply bail after "i > num_remote_replicas + 3")


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@923
PS3, Line 923:   // We need to generate multiple distinct hashes of the 
filename. This generates an
             :   // initial hash from the filename and separator. It uses this 
as a seed to a
             :   // linear congruential engine. It generates subsequent hash 
variants by drawing from
             :   // the linear congruential engine and incorporating that into 
the existing hash.
             :   // TODO: there are likely better ways to produce distinct 
hashes
             :   uint32_t hash = 
HashUtil::Hash(hdfs_file_split->file_name.data(),
             :       hdfs_file_split->file_name.length(), 0);
I've seen this comment twice now, right? Should we create

HashUtil::HashNTimes(data, bytes, seed, count) of some sort?

I'm confused by the following, btw, in hash-util.h:

  /// TODO: crc32 hashes with different seeds do not result in different hash 
functions.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@933
PS3, Line 933:   for (uint32_t i = 0; distinct_ipaddrs.size() != 
num_remote_replicas; i++) {
i is unused, I think. perhaps rename to "unused" and drop the "i++", to make 
that move obvious?


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc@731
PS3, Line 731:         if (result != StringParser::PARSE_SUCCESS || 
num_simulated_remote_replicas < 0) {
Let's create a maximum here too. I think with the current implementation, 
settings to 2^60 will probably hang the query forever and/or run us out of 
memory.


http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift@354
PS3, Line 354:
             :   // The number of simulated replicas to use when scheduling 
remote HDFS ranges.
             :   // Simulated replicas assign remote files to a consistent set 
of nodes, providing a
             :   // simulated type of locality. Restricting the number of nodes 
that will read a single
             :   // file increases the efficiency of file-related caches (e.g. 
the HDFS file handle
             :   // cache). It is designed to avoid changing file to node 
mappings when nodes come
             :   // or go. If set to 0, simulated remote replicas are disabled 
and remote ranges can be
             :   // scheduled on any node.
             :   NUM_SIMULATED_REMOTE_REPLICAS,
This is marked advanced and doesn't show up to the user, right? I think it's a 
bug if we end up needing to tweak this.



--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jan 2019 19:39:14 +0000
Gerrit-HasComments: Yes

Reply via email to