[ 
https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877136#comment-16877136
 ] 

Di Campo edited comment on KAFKA-5998 at 7/2/19 4:52 PM:
---------------------------------------------------------

Just in case it helps. I just found it today on 2.1.1 (again, I commented here 
some months ago). 
 5 brokers cluster, 3 Kafka Streams instances (2 `num.streams.threads` each). 
AMZN Linux. Docker on ECS.

I've seen that, before the task dies, it prints the following WARNs from one 
task.

Please note that from the 64 partitions, only a few of them fail starting at 
13:17. And the same batch of the same partitions start failing again at 13:42. 
 Why are the same partitions failing? Does it match with your findings?

 

{{[2019-07-02 13:17:01,101] WARN task [2_31] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_31/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:17:01,118] WARN task [2_47] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_47/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:17:01,156] WARN task [2_27] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_27/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:12,360] WARN task [2_63] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_63/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:12,579] WARN task [2_35] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_35/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:13,001] WARN task [2_23] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_23/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:23:18,421] WARN task [2_39] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_39/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:23:18,613] WARN task [2_55] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_55/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}

{{[2019-07-02 13:42:46,366] WARN task [2_31] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_31/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:42:46,473] WARN task [2_47] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_47/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:42:46,639] WARN task [2_27] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_27/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:46:19,888] WARN task [2_63] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_63/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:46:20,042] WARN task [2_35] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_35/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:46:20,380] WARN task [2_55] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_55/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:46:20,384] WARN task [2_23] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_23/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
 {{[2019-07-02 13:48:07,011] WARN task [2_39] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_39/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}

 

Later, the application died some minutes later, at 13:59:13. In case there is a 
relation, it was killed due to OOM.

 


was (Author: xmar):
Just in case it helps. I just found it today on 2.1.1 (again, I commented here 
some months ago). 
 5 brokers cluster, 3 Kafka Streams instances (2 `num.streams.threads` each). 
AMZN Linux. Docker on ECS.

I've seen that, before the task dies, it prints the following WARNs from one 
task.

Please note that from the 64 partitions, only a few of them fail starting at 
13:17. And the same batch of the same partitions start failing again at 13:42. 
Why are the same partitions failing? Does it match with your findings?

{{ [2019-07-02 13:17:01,101] WARN task [2_31] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_31/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{ [2019-07-02 13:17:01,118] WARN task [2_47] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_47/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{ [2019-07-02 13:17:01,156] WARN task [2_27] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_27/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:12,360] WARN task [2_63] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_63/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:12,579] WARN task [2_35] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_35/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:20:13,001] WARN task [2_23] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_23/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:23:18,421] WARN task [2_39] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_39/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:23:18,613] WARN task [2_55] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_55/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}


{{[2019-07-02 13:42:46,366] WARN task [2_31] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_31/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:42:46,473] WARN task [2_47] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_47/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:42:46,639] WARN task [2_27] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_27/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:46:19,888] WARN task [2_63] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_63/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:46:20,042] WARN task [2_35] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_35/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:46:20,380] WARN task [2_55] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_55/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:46:20,384] WARN task [2_23] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_23/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}
{{[2019-07-02 13:48:07,011] WARN task [2_39] Failed to write offset checkpoint 
file to /data/kafka-streams/stream-processor-0.0.1/2_39/.checkpoint: {} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)}}

 

Later, the application died some minutes later, at 13:59:13. In case there is a 
relation, it was killed due to OOM.

 

> /.checkpoint.tmp Not found exception
> ------------------------------------
>
>                 Key: KAFKA-5998
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5998
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.11.0.0, 0.11.0.1, 2.1.1
>            Reporter: Yogesh BG
>            Assignee: Bill Bejeck
>            Priority: Critical
>         Attachments: 5998.v1.txt, 5998.v2.txt, Topology.txt, exc.txt, 
> props.txt, streams.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its 
> since two days under load of around 2500 msgs per second.. On third day am 
> getting below exception for some of the partitions, I have 16 partitions only 
> 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or 
> directory)
>         at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
>         at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> 09:43:25.974 [ks_0_inst-StreamThread-15] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or 
> directory)
>         at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
>         at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to