[ https://issues.apache.org/jira/browse/KAFKA-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783114#comment-17783114 ]
Matthias J. Sax edited comment on KAFKA-15792 at 11/6/23 5:04 PM: ------------------------------------------------------------------ Can it be related to [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390?] However we can't see any metrics that can prove this. was (Author: JIRAUSER300456): Can it be related to [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390?] However we can't see any metrics that can prove this. > Kafka Streams stuck partition fixed after restarting the process > ---------------------------------------------------------------- > > Key: KAFKA-15792 > URL: https://issues.apache.org/jira/browse/KAFKA-15792 > Project: Kafka > Issue Type: New Feature > Components: streams > Affects Versions: 3.1.2 > Reporter: Patrick Pang > Priority: Major > > Our Kafka Streams process often show slow in processing a particular > partition on a specific instance. No data skew is detected, i.e. partitions > are mostly uniformly distributed. Symptom is huge lag on a specific > partition. We do notice network out is higher on problematic process than > normal process, often at 3x > After restarting the process, the lag drains within 5 minutes after startup. > This hints at internal processing issue of our streams application instead of > cluster or poison message. > Is there any metrics you suggest for us to look at, or is this a known issue? > Regularly bouncing the application doesn't look like a proper fix for > production systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)