Re: [DISCUSS] KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum

Colin McCabe Fri, 02 Aug 2019 16:45:54 -0700

Hi Ryanne,

Good idea.  I added some of this discussion to the KIP-- in particular, more 
about controller failover.


cheers,
Colin

On Fri, Aug 2, 2019, at 13:28, Ryanne Dolan wrote:
> Thanks Colin, that helps. Can we add some of this to the KIP?
> 
> Ryanne
> 
> On Fri, Aug 2, 2019 at 12:23 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > On Fri, Aug 2, 2019, at 07:50, Ryanne Dolan wrote:
> > > Thanks Colin, interesting KIP.
> > >
> > > I'm concerned that the KIP does not actually address its stated
> > > motivations. In particular, "Simpler Deployment and Configuration" are
> > not
> > > really achieved, given that: 1) the proposal still requires quorums (now
> > of
> > > controllers, instead of ZK nodes), with the same restrictions as ZK, e.g.
> > > at least three controllers and only an odd number of controllers, neither
> > > of which is easy to manage; 2) the proposal still requires separate
> > > processes with separate configuration (and indeed, no less configuration
> > > than ZK requires, namely a port to listen on); 3) configuration of
> > brokers
> > > is not simplified, as they still require a list of servers to contact
> > (now
> > > coordinators instead of ZK nodes). Is there any improvement to
> > > configuration and deployment I'm overlooking?
> >
> > Hi Ryanne,
> >
> > Thanks for taking a look.
> >
> > The difficulty in configuring and deploying ZooKeeper is not really in
> > configuring a port number, or even really in running a second JVM.  If that
> > were the main difficulty, then running ZK would definitely be pretty simple.
> >
> > The difficulty is that ZooKeeper is an entirely separate distributed
> > system with entirely separate configuration for things like security,
> > network setup, data directories, etc.  You also have separate systems for
> > management, metrics, and so on.  Learning how to configure security or
> > metrics in Kafka doesn't really help you with setting up the corresponding
> > features in ZK.  You have to start from scratch.  That is what we are
> > trying to avoid here.
> >
> > > Second, single-broker clusters are mentioned as a motivation, but it is
> > > unclear how this change would move in that direction. Seems Raft requires
> > > three nodes, so perhaps the minimum number of hosts would be three?
> >
> > Just like with ZooKeeper, you can run Raft on a single node.  Needless to
> > say, you don't have any tolerance against single-node failures when running
> > with a single node.
> >
> > >
> > > Third, "discrepancies between the controller state and the zookeeper
> > state"
> > > are mentioned as a problem, and I understand that controllers coordinate
> > > amongst themselves rather than via zookeeper, but I'm not sure there is a
> > > functional difference? It seems controllers can still disagree amongst
> > > themselves for periods of time, with the same consequences as disagreeing
> > > with ZK.
> >
> > Members of a Raft quorum cannot disagree with each other.  This is similar
> > to how ZooKeeper's "ZAB" protocol works.  There's more information in the
> > Raft paper: https://raft.github.io/raft.pdf
> >
> > >
> > > Finally,  you say "there is no generic way for the controller to follow
> > the
> > > ZooKeeper event log." I'm unsure this is a problem. Having a log is
> > > certainly powerful for consumers, but how would a controller use this log
> > > to do anything it can't without it? It seems only the latest compacted
> > > state is ever used, and there is nothing to undo or replay from the log.
> > > What future capabilities are you envisioning we would gain from carrying
> > > around log history?
> >
> > There are many advantages to treating metadata as a log.  Because the
> > controllers will now all track the latest state, controller failover will
> > not require a lengthy reloading period where we transfer all the state to
> > the new controller.  Because we always send deltas over the wire and not
> > full states, brokers can catch up with the latest state faster, and use
> > less bandwidth to do so.  It will even be possible for the brokers to cache
> > this state locally in a file on disk, so that broker startup can be much
> > faster.  All of these are important to scaling Kafka in the future.
> >  Treating metadata as a log avoids a lot of the complex failure corner
> > cases we have seen where a broker misses a single update sent from the
> > controller, but gets subsequent updates.
> >
> > best,
> > Colin
> >
> >
> > >
> > > Ryanne
> > >
> > >
> > > On Thu, Aug 1, 2019, 4:05 PM Colin McCabe <cmcc...@apache.org> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've written a KIP about removing ZooKeeper from Kafka.  Please take a
> > > > look and let me know what you think:
> > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
> > > >
> > > > cheers,
> > > > Colin
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum

Reply via email to