Hello all, it is a known issue that client can get unbalanced over time, leaving a few server doing all the work while other are idle.
Long term solutions for this have been discussed (e.g.: https://issues.apache.org/jira/browse/ZOOKEEPER-856), and I can't wait to see some progress there. In the meantime, there is a specific instance of this problem that I'd like to get feedback on, and maybe try a patch if the idea is well received. The problem exists in clusters with large number of clients (say, 10'000) where we want to perform a rolling bounce (i.e. restarting all servers one by one to avoid causing downtime). If we start in a situation like this: 1 : follower : 2000 clients 2 : follower : 2000 clients 3 : follower : 2000 clients 4 : follower : 2000 clients 5 : leader : 2000 clients And proceed to bounce all servers, leaving the leader last (to minimize the number of leadership changes), we end up in the situation below right before the leader is bounced (complete list of steps below): 1 : follower : 2381 clients (bounced) 2 : follower : 1756 clients (bounced) 3 : follower : 976 clients (bounced) 4 : follower : 0 clients (bounced) 5 : leader : 4881 clients (not bounced yet) Now we're going to bounce the leader, which by itself causes some commotion. Almost half of the clients at the same time are going to have to scramble to find a new server to connect to. In some cases, we've seen this go wrong. The leader bounce combined with a large number of clients migrating en-masse has a ripple effect that can causes sessions to expire and followers to fall behind. Motivated by this scenario, here's a proposal (again, just a stop-gap solution while waiting for a long-term solution to client imbalance). Would it be reasonable to introduce a 4-letter word that forces a server to shed part of its clients? e.g. "sh10" tells a server to shed 10% of its clients, "sh50" tells a server to shed 50%, etc. This command could be used in the scenario above to gradually migrate most of the clients away from #5 before bouncing it. (and in general in any other situation where we want to gently move clients away from a server before taking it down for maintenance). After a bounce is complete, it could be used to restore some balance manually (e.g. by hitting the most loaded server with "sh10" a few times). Is this something that users would find useful? Is this something developers accept into the system? If so, I'd be happy to try and contribute this myself, with some guidance. Cheers, M. --- Full sequence of steps for the numbers provided above. After bouncing 1: 1 : 0 2 : 2500 3 : 2500 4 : 2500 5 : 2500 After bouncing 2: 1 : 625 2 : 0 3 : 3125 4 : 3125 5 : 3125 After bouncing 3: 1 : 1406 2 : 781 3 : 0 4 : 3906 5 : 3906 After bouncing 4: 1 : 2381 2 : 1756 3 : 976 4 : 0 5 : 4881 After bouncing 4: 1 : 2381 2 : 1756 3 : 976 4 : 0 5 : 4881
