Shanthoosh Venkataraman created SAMZA-1572:
----------------------------------------------

             Summary: Add fixed retries on failure in KafkaCheckpointManager
                 Key: SAMZA-1572
                 URL: https://issues.apache.org/jira/browse/SAMZA-1572
             Project: Samza
          Issue Type: Bug
            Reporter: Shanthoosh Venkataraman
            Assignee: Shanthoosh Venkataraman


KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop when 
an irrecoverable failure happens, this indefinitely blocks the commit phase 
(there by preventing processing). This exception is revealed only during the 
shutdown of the job making shutdown block indefinitely since the markers for 
shutdown are ignored by runloop which is blocked on commit phase.
{code:java}
2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager]  [] Failed to write 
checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Flush failed. One or more 
batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN 
[KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:10.804 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:11.204 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:12.005 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:13.605 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:16.805 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:23.205 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:33.206 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:43.206 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:18:53.206 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:19:03.207 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:19:13.207 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exceptio 2018/01/22 19:19:23.207 WARN [KafkaCheckpointManager]  
[] Failed to write checkpoint log partition entry 
org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: 
org.apache.samza.system.SystemProducerException: Producer was unable to recover 
from previous exception.. Retrying.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to