[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955393#comment-16955393
 ] 

Raman Gupta edited comment on KAFKA-8803 at 10/20/19 5:34 AM:
--------------------------------------------------------------

[~bbejeck] Now the error is happening again, for two different streams than the 
stream which was failing with this error before. Both of streams now 
experiencing issus have also been running just fine until now, and changing 
`max.block.ms` for them. I still get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
1200000);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask                : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
1200000milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 


was (Author: rocketraman):
[~bbejeck] Now the error is happening again, for two different streams than the 
stream before before. Both of these streams have also been running just fine 
until now, and changing `max.block.ms` for the stream has no effect. I still 
get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
1200000);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask                : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
1200000milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 

> Stream will not start due to TimeoutException: Timeout expired after 
> 60000milliseconds while awaiting InitProducerId
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8803
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8803
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Raman Gupta
>            Priority: Major
>         Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask                : task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 60000milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to