Thanks for the suggestion Dhruba. I will open a Jira and continue the discussion there. I also got a chance to discuss some of the ideas at the zookeeper community meet yesterday.
I have prototyped some of my ideas and I should soon be able to share the performance sceanarios and measurements too. Thanks! Vishal -----Original Message----- From: Dhruba Borthakur [mailto:dhr...@gmail.com] Sent: Friday, July 01, 2011 2:50 PM To: dev@zookeeper.apache.org Subject: Re: Discussion on supporting a large number of clients for a zk ensemble Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session" idea seems promising to me. Vishal: it would be nice if you create a JIRA with your proposal and we can continue discussion in the JIRA? thanks a bunch, dhruba On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <vishal.kathu...@fb.com>wrote: > Thanks for looking at this Camille and Benjamin, > > setup: > There are 5 machines, 2 hosting clients and 3 hosting servers. > There is one client process on each of the client machines The client > process has 20 threads, each thread with 500 sessions. > So I have a total of 20K clients, so it isn't that high really > > Hardware > Two proc Intel(r) Xeon(r) Processor L5420 (total 8 cores) 8G RAM > > > The workload is fairly simple: > All sessions do is keep a watch on a node. Once the watch fires, the client > reads the contents of the node and puts the watch again. > There is one thread that is periodically updating the node being watched > (once every 30s - so very infrequent) > > When the system starts off, things are fine, then a few timers starts > missing and eventually there are lots of expired connections. > > The logs are really long, but pretty much repetitive, so I am attaching the > tail of the logs. > The client timeout is 300s > > JVM Parameters > -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50 > -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8180 > I have GC logging turned on. I am not seeing long GC pauses, so I don't > think that's it. > > Next steps I am trying > 1. Look at the CPU utilization on the server machines > 2. If the CPU is pegged at 100%, add some additional tracing in the server > to validate my hypothesis that the session tracker is getting overwhelmed > > If you folks have any other suggestions, that would greatly help. I started > working with zookeeper a couple of weeks ago so it is very likely I might be > missing something obvious. > > > Thanks! > Vishal > > -----Original Message----- > From: Benjamin Reed [mailto:br...@apache.org] > Sent: Sunday, May 29, 2011 8:42 PM > To: dev@zookeeper.apache.org > Subject: Re: Discussion on supporting a large number of clients for a zk > ensemble > > i second camille's suggestion. i also know there are other people looking > into using zookeeper with a large number of clients, so it would be good to > figure out what are the limits and then how to cross them. i like your > proposed solutions, but i would rather start down that road after we have > resolved the issues that we can for the normal clients. > > ben > > On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] < > camille.fourn...@gs.com> wrote: > > I would recommend that you spend some time making sure that your guess > about the cause is correct before trying to design solutions to the problem. > Can you provide us some hard numbers, logs, and configuration information? > It's always possible that some aspect of your configuration that you hadn't > considered important is in fact the trigger here. > > > > Thanks, > > Camille > > > > -----Original Message----- > > From: Vishal Kathuria [mailto:vishal.kathu...@fb.com] > > Sent: Friday, May 27, 2011 6:32 PM > > To: dev@zookeeper.apache.org > > Subject: Discussion on supporting a large number of clients for a zk > > ensemble > > > > Hi Folks, > > I wanted to start a discussion on how we can support a large number of > > clients in zookeeper. I am at facebook and we are using zookeeper for > > quite a few projects. There are a couple of projects where we are > > designing for a large number of clients. The projects are > > > > > > 1. Building a directory service for holding configuration > information (lookup table for which node to go to for a given key). > > > > 2. For HDFS clients, where clients lookup zookeeper for the > > current namenode > > > > This information changes infrequently and is small, so update rate or > size of data is not an issue. > > > > The key challenge is to support that large a number of clients (30K to > start with, but eventually could be 100K). A big chunk of the clients can > try to connect/disconnect at the same time - so herd effect can happen. > > > > I was trying out a 3 node ensemble. I noticed that with about 20K > clients, there we quite a few session expires and disconnects. > > I looked through the code briefly and since all the pings are eventually > handled by the leader, my guess is that the leader thread is not keeping up. > I haven't yet do the instrumentation/tracing to validate this. > > > > I have been thinking about how to improve this and thought of the > following solution. I am trying to hit 2 goals with this. > > > > 1. Make it possible to have a very large number of clients (each > client has a watch) without losing connections too often. > > > > 2. Improve how quickly a large number of clients can connect. > > > > Solution > > > > 1. The idea is to introduce a new type of session - "local" > session. A "local" session doesn't have a full functionality of a normal > session. > > > > 2. Local sessions cannot create ephemeral nodes. > > > > 3. Once a local session is lost, you cannot re-establish it using > the session-id/password. The session and its watches are gone for good. > > > > 4. When a local session connects, the session info is only > maintained on the zookeeper server that it is connected to. The leader is > not aware of the creation of such a session and there is no state written to > disk. > > > > 5. The pings and expiration is handled by the server that the > session is connected to. > > > > With the above changes, it should be easy to scale ZK by adding more > learners, which manage the "local" sessions independently. Also, the rate at > which you can establish "local" sessions, would be significantly higher than > the normal sessions. > > > > Would like to stir up a discussion on whether this is the best way to > achieve these goals or if I am missing simpler ways of accomplishing this. > > > > Thanks! > > Vishal > > > > . > > > > > -- Connect to me at http://www.facebook.com/dhruba