Onur Karaman created KAFKA-3810:
-----------------------------------
Summary: replication of internal topics should not be limited by
replica.fetch.max.bytes
Key: KAFKA-3810
URL: https://issues.apache.org/jira/browse/KAFKA-3810
Project: Kafka
Issue Type: Bug
Reporter: Onur Karaman
Assignee: Onur Karaman
>From the kafka-dev mailing list discussion:
[\[DISCUSS\] scalability limits in the
coordinator|http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccamquqbzddtadhcgl6h4smtgo83uqt4s72gc03b3vfghnme3...@mail.gmail.com%3E]
There's a scalability limit on the new consumer / coordinator regarding the
amount of group metadata we can fit into one message. This restricts a
combination of consumer group size, topic subscription sizes, topic assignment
sizes, and any remaining member metadata.
Under more strenuous use cases like mirroring clusters with thousands of
topics, this limitation can be reached even after applying gzip to the
__consumer_offsets topic.
Various options were proposed in the discussion:
# Config change: reduce the number of consumers in the group. This isn't always
a realistic answer in more strenuous use cases like MirrorMaker clusters or for
auditing.
# Config change: split the group into smaller groups which together will get
full coverage of the topics. This gives each group member a smaller
subscription.(ex: g1 has topics starting with a-m while g2 has topics starting
with n-z). This would be operationally painful to manage.
# Config change: split the topics among members of the group. Again this gives
each group member a smaller subscription. This would also be operationally
painful to manage.
# Config change: bump up KafkaConfig.messageMaxBytes (a topic-level config) and
KafkaConfig.replicaFetchMaxBytes (a broker-level config). Applying
messageMaxBytes to just the __consumer_offsets topic seems relatively harmless,
but bumping up the broker-level replicaFetchMaxBytes would probably need more
attention.
# Config change: try different compression codecs. Based on 2 minutes of
googling, it seems like lz4 and snappy are faster than gzip but have worse
compression, so this probably won't help.
# Implementation change: support sending the regex over the wire instead of the
fully expanded topic subscriptions. I think people said in the past that
different languages have subtle differences in regex, so this doesn't play
nicely with cross-language groups.
# Implementation change: maybe we can reverse the mapping? Instead of mapping
from member to subscriptions, we can map a subscription to a list of members.
# Implementation change: maybe we can try to break apart the subscription and
assignments from the same SyncGroupRequest into multiple records? They can
still go to the same message set and get appended together. This way the limit
become the segment size, which shouldn't be a problem. This can be tricky to
get right because we're currently keying these messages on the group, so I
think records from the same rebalance might accidentally compact one another,
but my understanding of compaction isn't that great.
# Implementation change: try to apply some tricks on the assignment
serialization to make it smaller.
# Config and Implementation change: bump up the __consumer_offsets topic
messageMaxBytes and (from [~junrao]) fix how we deal with the case when a
message is larger than the fetch size. Today, if the fetch size is smaller than
the fetch size, the consumer will get stuck. Instead, we can simply return the
full message if it's larger than the fetch size w/o requiring the consumer to
manually adjust the fetch size.
# Config and Implementation change: same as above but only apply the special
fetch logic when fetching from internal topics
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)