If Kafka installations are missing something(s) by not having or using the
latest Zookeeper from a feature or stability perspective that would be
something to understand maybe you could help with that Gwen?

I know one of the implementations used this Hadoop version
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.3-product.html
which appears to be using Zk 3.4.5.  I will have to check on the other two
(someone reminded me we saw this more than twice after I sent the email).
 I think maybe one of them was CDH but don't recall off the top of my head
it was a while ago.

A reason why another zookeeper cluster for Kafka vs other software systems
(Hadoop, Mesos, etc) is to separate risk of dependent services. One
zookeeper cluster can now take down more systems when it goes down (for
whatever reason, rogue server/code, upgrade, whatever) and becomes one big
single point of failure for everything.  If you aren't using zookeeper for
anything else that is mission critical it might not matter, it is relative
(and have seen this too of course).

We have also found deploying zookeeper to Mesos very (very (very)))
fruitful for dealing with and managing multiple zookeeper ensembles without
any headaches.... of course you can't do that with the Zookeeper ensemble
for Mesos but that goes back to my separation.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Mon, Aug 4, 2014 at 12:36 PM, Gwen Shapira <gshap...@cloudera.com> wrote:

> Also, specific Zookeeper 3.4.X version where loss of quorum occurred will
> help.
> 3.4.5 fixed some pretty serious issues around hanging.
>
> Gwen
>
> On Mon, Aug 4, 2014 at 9:29 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> > Thanks for the heads-up, Joe.
> >
> > We've been shipping Zookeeper 3.4.X for over  two years now (since
> > CDH4.0) and have many production customers. I'll check if there are
> > any known issues with breaking quorum. In any case I will take your
> > comments into account and see if I can arrange for extra testing.
> >
> > Can you share more information about the 3.4.X issues you were seeing?
> > Was there especially large clusters involved? large number of
> > consumers?
> >
> > Also, I'm curious to hear more about the reasons for separate ZK
> > cluster. I can see why you'll want it if you have thousands of
> > consumers, but are there other reasons? Multiple zookeeper installs
> > can be a pain to manage.
> >
> > Gwen
> >
> >
> >
> > On Mon, Aug 4, 2014 at 7:52 AM, Joe Stein <joe.st...@stealth.ly> wrote:
> >> I have heard issues from installations running 3.4.X that I have not
> heard
> >> from installations running 3.3.X (i.e. zk breaking quorum and cluster
> going
> >> down).
> >>
> >> In none of these cases did I have an opportunity to isolate and
> reproduce
> >> and confirm the issue happening and caused by 3.4.X. Moving to 3.3.x was
> >> agreed to being a lower risk/cost solution to the problem. Once on 3.3.X
> >> the issues didn't happen again.
> >>
> >> So I can't say for sure if there are issues with running 3.4.X but I
> would
> >> suggest some due diligence in testing and production operation to
> validate
> >> that every case that Kafka requires operates correctly (and over some
> >> time).  There is a cost to this so some company(s) will have to take
> that
> >> investment and do some cost vs the benefit of moving to 3.4.x.
> >>
> >> I currently recommend running a separate ZK cluster for Kafka production
> >> and not chroot into an existing one except for test/qa/dev.
> >>
> >> I don't know what others experience is with 3.4.X as I said the issues I
> >> have seen could have been coincidence.
> >>
> >> /*******************************************
> >>  Joe Stein
> >>  Founder, Principal Consultant
> >>  Big Data Open Source Security LLC
> >>  http://www.stealth.ly
> >>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >> ********************************************/
> >>
> >>
> >> On Mon, Aug 4, 2014 at 12:56 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Kafka currently builds against Zookeeper 3.3.4, which is quite old.
> >>>
> >>> Perhaps we should move to the more recent 3.4.x branch?
> >>>
> >>> I tested the change on my system and the only impact is to
> >>> EmbeddedZookeeper used in tests (it uses NIOServerCnxn.factory, which
> >>> was refactored into its own class in 3.4).
> >>>
> >>> Here's what the change looks like:
> >>> https://gist.github.com/gwenshap/d95b36e0bced53cab5bb
> >>>
> >>> Gwen
> >>>
>

Reply via email to