[
https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958442#comment-14958442
]
Joel Koshy commented on KAFKA-2017:
-----------------------------------
Ashish, that is correct - consumers can still remain detached from ZK. I think
going with ZK should work fine but the main point of this exercise is to
convince ourselves one way or the other. i.e., that persisting in ZK is better;
or persisting in Kafka is better; or even if persisting in Kafka is better,
that it is not worth doing it in Kafka at this time.
I agree that the write rate will be low most of the time. So there is mostly no
concern there. I say mostly, because there may be some special cases to be
aware of. For e.g., if there is a heavily multi-tenant topic that gets deleted
(or a new subscribed topic gets created) you could have a ton of consumers
rebalance all at the same time. So you will have (in the worst case) one
coordinator that ends up having to write a ton of data to zookeeper in a short
span of time; and if the state size is large that would make it worse. The
actual write load on ZK is probably not an issue (since it should be no worse
than what the old consumers would currently do - except that right now those
writes are spread across multiple consumers as opposed to one or a few brokers
with this proposal) but if you write enough then request handling latencies can
go up a bit on the brokers. (Incidentally, we are reeling from a recent
production incident that was caused by high request local times so it's fresh
in my mind :) ) I'm not suggesting that ZK will not work - it's just that
writes are much cheaper in Kafka than ZK; and reads can be made cheap by
caching. So if it is not difficult to persist state in Kafka maybe it is worth
buying some insurance now.
Also, one small correction to what I mentioned above - I don't think you need
to support different value schemas for different group management use-cases
because it will just be an _opaque_ byte array.
I don't quite agree that doing it in Kafka is not-so-simple mainly because most
of what we need to do is already done in the offset manager so we could
refactor that to make it more general. I also don't think that tooling or ops
would suffer. Yes we would need to look up the coordinator for the group or
consume the state topic, but that can be wrapped in a utility. We already do
that for tools like the offset checker and it works fine. Being able to consume
the state topic and quickly see what is going on globally is in fact a big plus
(as opposed to doing an expensive traversal of zookeeper) - as an example, the
[burrow|https://github.com/linkedin/Burrow] monitoring tool benefits greatly
from that property. (cc [~toddpalino])
Finally, I should add that if we do it in ZK and then decide we want to abandon
that and do it in Kafka that is also feasible - we can do something similar to
what we did for dual-commit with consumer offsets. It will be much simpler to
orchestrate since only the brokers need to turn on dual-write and then turn it
off (as opposed to all of the consumers for offset management).
> Persist Coordinator State for Coordinator Failover
> --------------------------------------------------
>
> Key: KAFKA-2017
> URL: https://issues.apache.org/jira/browse/KAFKA-2017
> Project: Kafka
> Issue Type: Sub-task
> Components: consumer
> Affects Versions: 0.9.0.0
> Reporter: Onur Karaman
> Assignee: Guozhang Wang
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch,
> KAFKA-2017_2015-05-21_19:02:47.patch
>
>
> When a coordinator fails, the group membership protocol tries to failover to
> a new coordinator without forcing all the consumers rejoin their groups. This
> is possible if the coordinator persists its state so that the state can be
> transferred during coordinator failover. This state consists of most of the
> information in GroupRegistry and ConsumerRegistry.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)