Hi All,

We are using Samza (0.10.0) in our system and recently ran into a problem
where due to Kafka broker being unstable for few moments, our samza tasks
while trying to write message to kafka got exceptions. After that moment,
they went into a very long retry loop (Integer.MAX times).

The repeated warning lines we are getting in container logs are:
*.*
*.*

*WARN [2016-05-23
06:41:36,645] [U:260,F:293,T:552,M:2,267]
producer.internals.Sender:[Sender:completeBatch:257] -
[kafka-producer-network-thread
| samza_producer-job4-1-1463686278936-2] - Got error produce response with
correlation id 5888322 on topic-partition Topic3-0, retrying (2144537752
attempts left). Error: CORRUPT_MESSAGE*
*.*
*.*

We experimented with setting the kafka producer 'retries' configuration to
a smaller number but it appears that samza does not permit overriding this
parameter. On top of it there is some additional Samza level retry logic to
re-send the message if kafka errored with a 'RetriableException'

May I know what is the reason for disallowing this override? Additionally,
what is the recommended way to handle such situations?

I would have thought that a possible policy would be that if after K
(configured by user) kafka retries, samza-kafka was still unable to send
the message, it could have thrown an exception out to the user land and let
the user determine what is to be done - in our case we would have chosen to
kill the container and have yarn samza app master request for a new one
from Yarn.

There seem to be at-least a couple of bugs related to this already open


   1. https://issues.apache.org/jira/browse/SAMZA-610
   2. https://issues.apache.org/jira/browse/SAMZA-911


cheers,
gaurav

Reply via email to