[
https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960809#comment-14960809
]
Todd Palino commented on KAFKA-2017:
------------------------------------
Just to throw in my 2 cents here, I don't think that persisting this state in a
special topic in Kafka is a bad idea. My only concern is that we have seen
issues with the offsets already from time to time, and we'll want to make sure
we take those lessons learned and handle them from the start. The ones I am
aware of are:
1) Creation of the special topic at cluster initialization. If we specify an RF
of N for the special topic, then the brokers must make this happen. The first
broker that comes up can't create it with an RF of 1 and own all the
partitions. Either it must reject all operations that would use the special
topic until N brokers are members of the cluster and the it can be created, or
it must create the topic in such a way that as soon as there are N brokers
available the RF is corrected to the configured number.
2) Load of the special topic into local cache. Whenever a coordinator loads the
special topic, there is a period of time while it is loading state where it
cannot service requests. We've seen problems with this related to log
compaction, where the partitions were excessively large, but I can see as we
move an increasing number of (group, partition) tuples over to Kafka-committed
offsets it could become a scale issue very easily. This should not be a big
deal for group state information, as that should always be smaller than the
offset information for the group, but we may want to create a longer term plan
for handling auto-scaling of the special topics (the ability to increase the
number of partitions and move group information from the partition it used to
hash to to the one it hashes to after scaling).
> Persist Coordinator State for Coordinator Failover
> --------------------------------------------------
>
> Key: KAFKA-2017
> URL: https://issues.apache.org/jira/browse/KAFKA-2017
> Project: Kafka
> Issue Type: Sub-task
> Components: consumer
> Affects Versions: 0.9.0.0
> Reporter: Onur Karaman
> Assignee: Guozhang Wang
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch,
> KAFKA-2017_2015-05-21_19:02:47.patch
>
>
> When a coordinator fails, the group membership protocol tries to failover to
> a new coordinator without forcing all the consumers rejoin their groups. This
> is possible if the coordinator persists its state so that the state can be
> transferred during coordinator failover. This state consists of most of the
> information in GroupRegistry and ConsumerRegistry.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)