Joe Stein created KAFKA-1756:
--------------------------------

             Summary: never allow the replica fetch size to be less than the 
max message size
                 Key: KAFKA-1756
                 URL: https://issues.apache.org/jira/browse/KAFKA-1756
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.8.1.1, 0.8.2
            Reporter: Joe Stein
            Priority: Blocker
             Fix For: 0.8.2


There exists a very hazardous scenario where if the max.message.bytes is 
greather than the replica.fetch.max.bytes the message will never replicate. 
This will bring the ISR down to 1 (eventually/quickly once 
replica.lag.max.messages is reached). If during this window the leader itself 
goes out of the ISR then the new leader will commit the last offset it 
replicated. This is also bad for sync producers with -1 ack because they will 
all block (heard affect caused upstream) in this scenario too.

The fix here is two fold

1) when setting max.message.bytes using kafka-topics we must check first each 
and every broker (which will need some thought about how todo this because of 
the topiccommand zk notification) that max.message.bytes <= 
replica.fetch.max.bytes and if it is NOT then DO NOT create the topic

2) if you change this in server.properties then the broker should not start if 
max.message.bytes > replica.fetch.max.bytes

This does beg the question/issue some about centralizing certain/some/all 
configurations so that inconsistencies do not occur (where broker 1 has 
max.message.bytes > replica.fetch.max.bytes but broker 2 max.message.bytes <= 
replica.fetch.max.bytes because of error in properties). I do not want to 
conflate this ticket but I think it is worth mentioning/bringing up here as it 
is a good example where it could make sense. 

I set this as BLOCKER for 0.8.2-beta because we did so much work to enable 
consistency vs availability and 0 data loss this corner case should be part of 
0.8.2-final

Also, I could go one step further (though I would not consider this part as a 
blocker for 0.8.2 but interested to what other folks think) about a consumer 
replica fetch size so that if the message max is increased messages will no 
longer be consumed (since the consumer fetch max would be <  max.message.bytes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to