In general when determining the number of ZooKeeper serving nodes to
deploy (the size of an ensemble) you need to think in terms of
reliability, and not performance.
Reliability:
A single ZooKeeper server (standalone) is essentially a coordinator with
no reliability (a single serving node failure brings down the ZK service).
A 3 server ensemble (you need to jump to 3 and not 2 because ZK works
based on simple majority voting) allows for a single server to fail and
the service will still be available.
So if you want reliability go with at least 3. We typically recommend
having 5 servers in "online" production serving environments. This
allows you to take 1 server out of service (say planned maintenance) and
still be able to sustain an unexpected outage of one of the remaining
servers w/o interruption of the service.
Performance:
Write performance actually _decreases_ as you add ZK servers, while read
performance increases modestly: http://bit.ly/9JEUju
See this page for a recent survey I did looking at operational latency
with both standalone server and an ensemble of size 3:
http://bit.ly/4ekN8G You'll notice that a single core machine running a
standalone ZK ensemble (1 server) is still able to process 15k requests
per second. This is orders of magnitude greater than what hbase
currently uses ZK for (may change in future). (background:
http://bit.ly/csQLQ5)
Patrick
Micha? Podsiad?owski wrote:
Hey all,
I was asking about minimum number of zookeepers and usually everybody was
saying odd number >=3. Are there any reasons for this. Have you encounter
any problems from single zookeeper? As far as know already hbase is doing
very very little operations using zookeeper so load on it is insignificant.
If I have only one master and one namenode i do have 2 SPOF so another one
is not a big deal. Currently we have 3 zookeepers running on xen os with
datanode/hregion on physical machine.
Can someone advice something?
Thanks,
Michal