Re: [DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

Jay Kreps Tue, 01 Dec 2015 11:58:51 -0800

Hey Joe,

Thanks for raising this. People really want to get rid of the ZK
dependency, I agree it is among the most asked for things. Let me give a
quick critique and a more radical plan.

I don't think making ZK pluggable is the right thing to do. I have a lot of
experience with this dynamic of introducing plugins for core functionality
because I previously worked on a key-value store called Voldemort in which
we made both the protocol and storage engine totally pluggable. I
originally felt this was a good thing both philosophically and practically,
but in retrospect came to believe it was a huge mistake--what people really
wanted was one really excellent implementation with the kind of insane
levels of in-production usage and test coverage that infrastructure
demands. Pluggability is actually really at odds with this, and the ability
to actually abstract over some really meaty dependency like a storage
engine never quite works.

People dislike the ZK dependency because it effectively doubles the
operational load of Kafka--it doubles the amount of configuration,
monitoring, and understanding needed. Replacing ZK with a similar system
won't fix this problem though--all the other consensus services are equally
complex (and often less mature)--and it will cause two new problems. First
there will be a layer of indirection that will make reasoning and improving
the ZK implementation harder. For example, note that your plug-in api
doesn't seem to cover multi-get and multi-write, when we added that we
would end up breaking all plugins. Each new thing will be like that. Ops
tools, config, documentation, etc will no longer be able to include any
coverage of ZK because we can't assume ZK so all that becomes much harder.
The second problem is that this introduces a combinatorial testing problem.
People say they want to swap out ZK but they are assuming whatever they
swap in will work equally well. How will we know that is true? The only way
to explode out the testing to run with every possible plugin.

If you want to see this in action take a look at ActiveMQ. ActiveMQ is less
a system than a family of co-operating plugins and a configuration language
for assembling them. Software engineers and open source communities are
really prone to this kind of thing because "we can just make it pluggable"
ends any argument. But the actual implementation is a mess, and later
improvements in their threading, I/O, and other core models simply couldn't
be made across all the plugins.

This blog post on configurability in UI is a really good summary of a
similar dynamic:
http://ometer.com/free-software-ui.html

Anyhow, not to go too far off on a rant. Clearly I have plugin PTSD :-)

I think instead we should explore the idea of getting rid of the zookeeper
dependency and replace it with an internal facility. Let me explain what I
mean. In terms of API what Kafka and ZK do is super different, but
internally it is actually quite similar--they are both trying to maintain a
CP log.

What would actually make the system significantly simpler would be to
reimplement the facilities you describe on top of Kafka's existing
infrastructure--using the same log implementation, network stack, config,
monitoring, etc. If done correctly this would dramatically lower the
operational load of the system versus the current Kafka+ZK or proposed
Kafka+X.

I don't have a proposal for how this would work and it's some effort to
scope it out. The obvious thing to do would just be to keep the existing
ISR/Controller setup and rebuild the controller etc on a RAFT/Paxos impl
using the Kafka network/log/etc and have a replicated config database
(maybe rocksdb) that was fed off the log and shared by all nodes.

If done well this could have the advantage of potentially allowing us to
scale the number of partitions quite significantly (the k/v store would not
need to be all in memory), though you would likely still have limits on the
number of partitions per machine. This would make the minimum Kafka cluster
size be just your replication factor.

People tend to feel that implementing things like RAFT or Paxos is too hard
for mere mortals. But I actually think it is within our capabilities, and
our testing capabilities as well as experience with this type of thing have
improved to the point where we should not be scared off if it is the right
path.

This approach is likely more work then plugins (though maybe not, once you
factor in all the docs, testing, etc) but if done correctly it would be an
unambiguous step forward--a simpler, more scalable implementation with no
operational dependencies.

Thoughts?

-Jay

On Tue, Dec 1, 2015 at 11:12 AM, Joe Stein <joe.st...@stealth.ly> wrote:

> I would like to start a discussion around the work that has started in
> regards to KIP-30
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems
>
> The impetus for working on this came a lot from the community. For the last
> year(~+) it has been the most asked question at any talk I have given
> (personally speaking). It has come up a bit also on the mailing list
> talking about zkclient vs currator. A lot of folks want to use Kafka but
> introducing dependencies are hard for the enterprise so the goals behind
> this is making it so that using Kafka can be done as easy as possible for
> the operations teams to-do when they do. If they are already supporting
> ZooKeeper they can keep doing that but if not they want (users) to use
> something else they are already supporting that can plug-in to-do the same
> things.
>
> For the core project I think we should leave in upstream what we have. This
> gives a great baseline regression for folks and makes the work for "making
> what we have plug-able work" a good defined task (carve out, layer in API
> impl, push back tests pass). From there then when folks want their
> implementation to be something besides ZooKeeper they can develop, test and
> support that if they choose.
>
> We would like to suggest that we have the plugin interface be Java based
> for minimizing depends for JVM impl. This could be in another directory
> something TBD /<name>.
>
> If you have a server you want to try to get it working but you aren't on
> the JVM don't be afraid just think about a REST impl and if you can work
> inside of that you have some light RPC layers (this was the first pass
> prototype we did to flush-out the public api presented on the KIP).
>
> There are a lot of parts to working on this and the more implementations we
> have the better we can flush out the public interface. I will leave the
> technical details and design to JIRA tickets that are linked through the
> confluence page as these decisions come about and code starts for reviews
> and we can target the specific modules having the context separate is
> helpful especially if multiple folks are working on it.
> https://issues.apache.org/jira/browse/KAFKA-2916
>
> Do other folks want to build implementations? Maybe we should start a
> confluence page for those or use an existing one and add to it so we can
> coordinate some there to.
>
> Thanks!
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - - - -
>      [image: Logo-Black.jpg]
>   http://www.elodina.net
>     http://www.stealth.ly
> - - - - - - - - - - - - - - - - - - -
>

Re: [DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

Reply via email to