[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-30 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224722#comment-16224722
 ] 

Ismael Juma commented on KAFKA-5060:


[~rparmentier], can you include consumer logs as well?

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-30 Thread Romaric Parmentier (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224570#comment-16224570
 ] 

Romaric Parmentier commented on KAFKA-5060:
---

Hi Ismael,

Yes we are reporting the same issue.

Unfortunately, when I was starting my consumer, the offset was simply not found 
and the rule defined by the option "auto.offset.reset" was applied. The best 
option to avoid any problem is to defined this option to "none" once the 
consumer has already been started once.

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-28 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223661#comment-16223661
 ] 

Ismael Juma commented on KAFKA-5060:


Just to be clear, are both of you reporting the same issue where an offset is 
not found because the index is being rebuilt? What is the error received by the 
consumer? If it receives LEADER_NOT_AVAILABLE, then it should keep retrying.

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-10-28 Thread Spiros Ioannou (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223653#comment-16223653
 ] 

Spiros Ioannou commented on KAFKA-5060:
---

Hi Romaric, 
you're right, it happened to us again, so no I have no idea..

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-09-13 Thread Romaric Parmentier (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164275#comment-16164275
 ] 

Romaric Parmentier commented on KAFKA-5060:
---

Hi Spiros,

Very thank you for your answer, I really though it was the root cause of this 
index corruption but after talking with our OPS team, it appears that our 
systemd conf is ok:

{noformat}
:~$ systemctl show kafka.service | grep -i timeout
TimeoutStartUSec=infinity
TimeoutStopUSec=infinity
JobTimeoutUSec=infinity
JobTimeoutAction=none
{noformat}

Any idea ?


> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-09-01 Thread Romaric Parmentier (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150201#comment-16150201
 ] 

Romaric Parmentier commented on KAFKA-5060:
---

Thank you for your email. I’m out of the office and will be back on September 
11. During this period I will have limited access to my email.
For immediate assistance please contact Yohan Sanchez (ysanc...@freewheel.tv) 
or Antoine Bonavita (abonav...@freewheel.tv)

Best Regards,

Romaric


> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-09-01 Thread Spiros Ioannou (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150200#comment-16150200
 ] 

Spiros Ioannou commented on KAFKA-5060:
---

Well it seems we found the issue, we had systemd to stop kafka, and the default 
stop timeout is 90 seconds. After 90 seconds systemd kills the process with 
SIGKILL. Raising the stop timeout to 400 seconds stoped the production of such 
errors.   It seems kafka takes 3 minutes to shutdown after the initial SIGTERM, 
mostly removing fetchers from partitions. (We have 3 kafka nodes, replication 
2, 1000 partitions * 4 topics.).

> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110337.index) has non-zero size but 
> the last offset is 130248110337 which is no larger than the base offset 
> 130248110337.}. deleting 
> /var/lib/kafka/my_topic-6/130248110337.timeindex, 
> /var/lib/kafka/my_topic-6/130248110337.index and rebuilding index... 
> (kafka.log.Log)
> [2017-04-12 15:01:41,490] INFO Completed load of log my_topic-6 with 6146 log 
> segments and log end offset 130251895436 in 169696 ms (kafka.log.Log)
> Maybe it is handled by the new consumer or there is a some configuration to 
> handle this case but I didn't find anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5060) Offset not found while broker is rebuilding its index after an index corruption

2017-08-30 Thread Spiros Ioannou (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147141#comment-16147141
 ] 

Spiros Ioannou commented on KAFKA-5060:
---

We get the same error on every restart on kafka 0.11:

{noformat}

[2017-08-30 12:08:12,970] INFO Loading producer state from offset 1012 for 
partition i2SvarEvts-851 with message format version 2 (kafka.log.Log)
[2017-08-30 12:08:12,970] INFO Loading producer state from snapshot file 
1012.snapshot for partition i2SvarEvts-851 
(kafka.log.ProducerStateManager)
[2017-08-30 12:08:12,970] INFO Completed load of log i2SvarEvts-851 with 1 log 
segments, log start offset 0 and log end offset 1012 in 11 ms (kafka.log.Log)
[2017-08-30 12:08:12,973] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(/disk1/kafka/kafka-logs/i2SvarEvts-5/.index) has non-zero 
size but the last offset is 0 which is no larger than the base offset 0.}. 
deleting /disk1/kafka/kafka-logs/i2SvarEvts-5/.timeindex, 
/disk1/kafka/kafka-logs/i2SvarEvts-5/.index,
and /disk1/kafka/kafka-logs/i2SvarEvts-5/.txnindex and 
rebuilding index... (kafka.log.Log)
[2017-08-30 12:08:12,973] INFO Recovering unflushed segment 0 in log 
i2SvarEvts-5. (kafka.log.Log)
[2017-08-30 12:08:12,973] INFO Loading producer state from offset 0 for 
partition i2SvarEvts-5 with message format version 2 (kafka.log.Log)
[2017-08-30 12:08:12,974] INFO Completed load of log i2SvarEvts-5 with 1 log 
segments, log start offset 0 and log end offset 0 in 3 ms (kafka.log.Log)
[2017-08-30 12:08:12,976] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(/disk1/kafka/kafka-logs/i2SvarEvts-167/.index) has 
non-zero size but the last offset is 0 which is no larger than the base offset 
0.}. deleting 
/disk1/kafka/kafka-logs/i2SvarEvts-167/.timeindex, 
/disk1/kafka/kafka-logs/i2SvarEvts-167/.i
ndex, and /disk1/kafka/kafka-logs/i2SvarEvts-167/.txnindex 
and rebuilding index... (kafka.log.Log)
[2017-08-30 12:08:12,977] INFO Recovering unflushed segment 0 in log 
i2SvarEvts-167. (kafka.log.Log)
[2017-08-30 12:08:12,977] INFO Loading producer state from offset 0 for 
partition i2SvarEvts-167 with message format version 2 (kafka.log.Log)
[2017-08-30 12:08:12,977] INFO Completed load of log i2SvarEvts-167 with 1 log 
segments, log start offset 0 and log end offset 0 in 3 ms (kafka.log.Log)
[2017-08-30 12:08:12,980] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(/disk1/kafka/kafka-logs/i2SvarEvts-300/.index) has 
non-zero size but the last offset is 0 which is no larger than the base offset 
0.}. deleting 
/disk1/kafka/kafka-logs/i2SvarEvts-300/.timeindex, 
/disk1/kafka/kafka-logs/i2SvarEvts-300/.i
ndex, and /disk1/kafka/kafka-logs/i2SvarEvts-300/.txnindex 
and rebuilding index... (kafka.log.Log)
[2017-08-30 12:08:12,999] INFO Recovering unflushed segment 0 in log 
i2SvarEvts-300. (kafka.log.Log)
[2017-08-30 12:08:13,021] INFO Loading producer state from offset 7167 for 
partition i2SvarEvts-300 with message format version 2 (kafka.log.Log)
[2017-08-30 12:08:13,021] INFO Loading producer state from snapshot file 
7167.snapshot for partition i2SvarEvts-300 
(kafka.log.ProducerStateManager)
[2017-08-30 12:08:13,021] INFO Completed load of log i2SvarEvts-300 with 1 log 
segments, log start offset 0 and log end offset 7167 in 43 ms (kafka.log.Log)
...

{noformat}


> Offset not found while broker is rebuilding its index after an index 
> corruption
> ---
>
> Key: KAFKA-5060
> URL: https://issues.apache.org/jira/browse/KAFKA-5060
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>  Labels: reliability
>
> After rebooting our kafka servers to change a configuration, one of my 
> consumers running old consumer has fail to find a new leader for a period of 
> 15 minutes. The topic has a replication factor of 2.
> When the spare server has finally been found and elected leader, the previous 
> consumed offset was not able to be found because the broker was rebuilding 
> index. 
> So my consumer has decided to follow the configuration auto.offset.reset 
> which is pretty bad because the offset will exist 2 minutes later:
> 2017-04-12 14:59:08,568] WARN Found a corrupted index file due to requirement 
> failed: Corrupt index found, index file 
> (/var/lib/kafka/my_topic-6/130248110