Joe Stein created KAFKA-1756:
--------------------------------
Summary: never allow the replica fetch size to be less than the
max message size
Key: KAFKA-1756
URL: https://issues.apache.org/jira/browse/KAFKA-1756
Project: Kafka
Issue Type: Bug
Affects Versions: 0.8.1.1, 0.8.2
Reporter: Joe Stein
Priority: Blocker
Fix For: 0.8.2
There exists a very hazardous scenario where if the max.message.bytes is
greather than the replica.fetch.max.bytes the message will never replicate.
This will bring the ISR down to 1 (eventually/quickly once
replica.lag.max.messages is reached). If during this window the leader itself
goes out of the ISR then the new leader will commit the last offset it
replicated. This is also bad for sync producers with -1 ack because they will
all block (heard affect caused upstream) in this scenario too.
The fix here is two fold
1) when setting max.message.bytes using kafka-topics we must check first each
and every broker (which will need some thought about how todo this because of
the topiccommand zk notification) that max.message.bytes <=
replica.fetch.max.bytes and if it is NOT then DO NOT create the topic
2) if you change this in server.properties then the broker should not start if
max.message.bytes > replica.fetch.max.bytes
This does beg the question/issue some about centralizing certain/some/all
configurations so that inconsistencies do not occur (where broker 1 has
max.message.bytes > replica.fetch.max.bytes but broker 2 max.message.bytes <=
replica.fetch.max.bytes because of error in properties). I do not want to
conflate this ticket but I think it is worth mentioning/bringing up here as it
is a good example where it could make sense.
I set this as BLOCKER for 0.8.2-beta because we did so much work to enable
consistency vs availability and 0 data loss this corner case should be part of
0.8.2-final
Also, I could go one step further (though I would not consider this part as a
blocker for 0.8.2 but interested to what other folks think) about a consumer
replica fetch size so that if the message max is increased messages will no
longer be consumed (since the consumer fetch max would be < max.message.bytes
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)