[
https://issues.apache.org/jira/browse/KAFKA-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax resolved KAFKA-9270.
------------------------------------
Resolution: Fixed
> KafkaStream crash on offset commit failure
> ------------------------------------------
>
> Key: KAFKA-9270
> URL: https://issues.apache.org/jira/browse/KAFKA-9270
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.0.1
> Reporter: Rohan Kulkarni
> Priority: Critical
>
> On our Production server we intermittently observe Kafka Streams get crashed
> with TimeoutException while committing offset. The only workaround seems to
> be restarting the application which is not a desirable solution for a
> production environment.
>
> While have already implemented ProductionExceptionHandler which does not
> seems to address this.
>
> Please provide a fix for this or a viable workaround.
>
> +Application side logs:+
> 2019-11-17 08:28:48.055 +0000
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -
> org.apache.kafka.streams.processor.internals.AssignedStreamsTasks
> [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373]
> - stream-thread
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] *Failed to
> commit stream task 0_1 due to the following error:*
> *org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired
> before successfully committing offsets*
> \{AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}
>
> 2019-11-17 08:29:00.891 +0000
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -
> [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId:
> AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17
> 08:29:00.891 +0000
> [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -
> [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId:
> AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager
> MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo:
> HostInfo\{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions
> = [], controller = null) Active tasks: Running: Suspended: Restoring: New:
> Standby tasks: Running: Suspended: Restoring: New:
> org.apache.kafka.common.errors.*TimeoutException: Timeout of 60000ms expired
> before successfully committing offsets*
> \{AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}
>
> +Kafka broker logs:+
> [2019-11-17 13:53:22,774] WARN *Client session timed out, have not heard from
> server in 6669ms for sessionid 0x10068e4a2944c2f*
> (org.apache.zookeeper.ClientCnxn)
> [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from
> server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
>
> Regards,
> Rohan
--
This message was sent by Atlassian Jira
(v8.3.4#803005)