Shanthoosh Venkataraman created SAMZA-1572:
----------------------------------------------
Summary: Add fixed retries on failure in KafkaCheckpointManager
Key: SAMZA-1572
URL: https://issues.apache.org/jira/browse/SAMZA-1572
Project: Samza
Issue Type: Bug
Reporter: Shanthoosh Venkataraman
Assignee: Shanthoosh Venkataraman
KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop when
an irrecoverable failure happens, this indefinitely blocks the commit phase
(there by preventing processing). This exception is revealed only during the
shutdown of the job making shutdown block indefinitely since the markers for
shutdown are ignored by runloop which is blocked on commit phase.
{code:java}
2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager] [] Failed to write
checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Flush failed. One or more
batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN
[KafkaCheckpointManager] [] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:10.804 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:11.204 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:12.005 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:13.605 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:16.805 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:23.205 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:33.206 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:43.206 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:18:53.206 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:19:03.207 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:19:13.207 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exceptio 2018/01/22 19:19:23.207 WARN [KafkaCheckpointManager]
[] Failed to write checkpoint log partition entry
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb:
org.apache.samza.system.SystemProducerException: Producer was unable to recover
from previous exception.. Retrying.
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)