Hi Folks,
I wanted to start a discussion on how we can support a large number of clients 
in zookeeper.  I am at facebook and we are using zookeeper for quite a few 
projects. There are a couple of projects where we are designing for a large 
number of clients. The projects are


1.       Building a directory service for holding configuration information 
(lookup table for which node to go to for a given key).

2.       For HDFS clients, where clients lookup zookeeper for the current 
namenode

This information changes infrequently and is small, so update rate or size of 
data is not an issue.

The key challenge is to support that large a number of clients (30K to start 
with, but eventually could be 100K).  A big chunk of the clients can try to 
connect/disconnect at the same time  - so herd effect can happen.

I was trying out a 3 node ensemble. I noticed that with about 20K clients, 
there we quite a few session expires and disconnects.
I looked through the code briefly and since all the pings are eventually 
handled by the leader, my guess is that the leader thread is not keeping up. I 
haven't yet do the instrumentation/tracing to validate this.

I have been thinking about how to improve this and thought of the following 
solution. I am trying to hit 2 goals with this.

1.       Make it possible to have a very large number of clients (each client 
has a watch) without losing connections too often.

2.       Improve how quickly a large number of clients can connect.

Solution

1.       The idea is to introduce a new type of session - "local" session. A 
"local" session doesn't have a full functionality of a normal session.

2.       Local sessions cannot create ephemeral nodes.

3.       Once a local session is lost, you cannot re-establish it using the 
session-id/password. The session and its watches are gone for good.

4.       When a local session connects, the session info is only maintained on 
the zookeeper server that it is connected to. The leader is not aware of the 
creation of such a session and there is no state written to disk.

5.       The pings and expiration is handled by the server that the session is 
connected to.

With the above changes, it should be easy to scale ZK by adding more learners, 
which manage the "local" sessions independently. Also, the rate at which you 
can establish "local" sessions, would be significantly higher than the normal 
sessions.

Would like to stir up a discussion on whether this is the best way to achieve 
these goals or if I am missing simpler ways of accomplishing this.

Thanks!
Vishal

.

Reply via email to