Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12158 )
Change subject: KUDU-2348: Pick a random replica in RemoteTablet.java ...................................................................... Patch Set 9: > Thinking about this a bit, I wonder what behavior we really want. > > Today, every client inserts the servers into a hashmap, and then > would return the last server in hashmap iteration order. In other > words, we end up ranking the servers by something like > hashcode%num_hashmap_buckets. Given that num_hashmap_buckets is > likely constant across all tablets (these hashtables almost always > have 3 elements so we likely have the default of 16 buckets, the > ranking function is more or less consistent across all clients and > all tablets. > > The major problem this causes is this: in a cluster without > locality (eg fully remote), whichever servers have high hashcode%16 > are going to get significantly more read load than those with low > hashcode%16. I wrote a little simulation here: > > https://gist.github.com/b3de552784da4afa29a2f1f66673b187 > > Running this script results in a load distribution like: > > ts_idx % of load > ----------------- > 0 10.1 > 2 9.2 > 27 8.0 > 12 8.0 > 10 7.8 > 13 6.5 > 11 6.2 > 22 6.2 > 24 6.0 > 20 5.0 > 23 4.0 > 15 3.6 > 18 2.9 > 3 2.9 > 21 2.9 > 16 2.1 > 4 1.8 > 25 1.2 > 6 1.1 > 5 1.1 > 9 1.0 > 1 1.0 > 7 0.4 > 19 0.3 > 26 0.3 > 29 0.2 > 28 0.0 > 8 0.0 > 14 0.0 > 17 0.0 > > It seems that this patch will change the behavior so that the > server preference is randomized and dependent on the client, which > solves the issue, but also means that, for a given tablet, load > will be spread evenly across the replicas if there are multiple > clients. Depending on the workload, that may be good or bad -- in > many cases you would prefer _not_ to spread the load, so that you > can make more efficient use of cache memory. The spreading of load > is then accomplished by partitioning rather than replication. > > Anyone have thoughts on how we might express this preference > through the API? > > A separate concern with the particular implementation is that pid > may have a lot of correlation across machines, particularly if the > client is running inside Docker containers or set to start at boot. > AFAIK pids are sequentially assigned, so within Docker containers > you would expect all clients to end up with identical pids. If we > need a randomized id for a process I think it's better to use > Java's random number generation to get one and assign it in a > static intializer. Considering making efficient use of cache memory, different clients would choose the same server for a fixed tablet. But it may also cause heavy load on one server if all clients scan a particular tablet. So we should make a trade-off on making efficient use of cache memory and spreading load across all servers in cluster. Your concern about pid is right, it's better to use a random seed when a client is created. -- To view, visit http://gerrit.cloudera.org:8080/12158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d70e45d4c9532bb32223c1dddd0936b4ff8fd99 Gerrit-Change-Number: 12158 Gerrit-PatchSet: 9 Gerrit-Owner: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Comment-Date: Thu, 30 May 2019 08:43:55 +0000 Gerrit-HasComments: No