[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150200#comment-16150200
 ] 

Spiros Ioannou commented on KAFKA-5060:
---------------------------------------

Well it seems we found the issue, we had systemd to stop kafka, and the default 
stop timeout is 90 seconds. After 90 seconds systemd kills the process with 
SIGKILL. Raising the stop timeout to 400 seconds stoped the production of such 
errors.   It seems kafka takes 3 minutes to shutdown after the initial SIGTERM, 
mostly removing fetchers from partitions. (We have 3 kafka nodes, replication 
2, 1000 partitions * 4 topics.).

> Offset not found while broker is rebuilding its index after an index 
> corruption
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-5060
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5060
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.0
>            Reporter: Romaric Parmentier
>            Priority: Critical
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/00000000130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/00000000130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/00000000130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to