Also, specific Zookeeper 3.4.X version where loss of quorum occurred will help. 3.4.5 fixed some pretty serious issues around hanging.
Gwen On Mon, Aug 4, 2014 at 9:29 AM, Gwen Shapira <gshap...@cloudera.com> wrote: > Thanks for the heads-up, Joe. > > We've been shipping Zookeeper 3.4.X for over two years now (since > CDH4.0) and have many production customers. I'll check if there are > any known issues with breaking quorum. In any case I will take your > comments into account and see if I can arrange for extra testing. > > Can you share more information about the 3.4.X issues you were seeing? > Was there especially large clusters involved? large number of > consumers? > > Also, I'm curious to hear more about the reasons for separate ZK > cluster. I can see why you'll want it if you have thousands of > consumers, but are there other reasons? Multiple zookeeper installs > can be a pain to manage. > > Gwen > > > > On Mon, Aug 4, 2014 at 7:52 AM, Joe Stein <joe.st...@stealth.ly> wrote: >> I have heard issues from installations running 3.4.X that I have not heard >> from installations running 3.3.X (i.e. zk breaking quorum and cluster going >> down). >> >> In none of these cases did I have an opportunity to isolate and reproduce >> and confirm the issue happening and caused by 3.4.X. Moving to 3.3.x was >> agreed to being a lower risk/cost solution to the problem. Once on 3.3.X >> the issues didn't happen again. >> >> So I can't say for sure if there are issues with running 3.4.X but I would >> suggest some due diligence in testing and production operation to validate >> that every case that Kafka requires operates correctly (and over some >> time). There is a cost to this so some company(s) will have to take that >> investment and do some cost vs the benefit of moving to 3.4.x. >> >> I currently recommend running a separate ZK cluster for Kafka production >> and not chroot into an existing one except for test/qa/dev. >> >> I don't know what others experience is with 3.4.X as I said the issues I >> have seen could have been coincidence. >> >> /******************************************* >> Joe Stein >> Founder, Principal Consultant >> Big Data Open Source Security LLC >> http://www.stealth.ly >> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> >> ********************************************/ >> >> >> On Mon, Aug 4, 2014 at 12:56 AM, Gwen Shapira <gshap...@cloudera.com> wrote: >> >>> Hi, >>> >>> Kafka currently builds against Zookeeper 3.3.4, which is quite old. >>> >>> Perhaps we should move to the more recent 3.4.x branch? >>> >>> I tested the change on my system and the only impact is to >>> EmbeddedZookeeper used in tests (it uses NIOServerCnxn.factory, which >>> was refactored into its own class in 3.4). >>> >>> Here's what the change looks like: >>> https://gist.github.com/gwenshap/d95b36e0bced53cab5bb >>> >>> Gwen >>>