We are trying to understand the order of commits when processing each message in a Samza job.
T1: input offset commit T2: changelog commit T3: output commit By looking at the code snippet in https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala#L155-L171, my understanding is that for each input message, Samza always send update message on changelog, send the output message and then commit the input offset. It makes sense to me at the high level in terms of at least once processing. Specifically, we have two dumb questions: 1. When implementing our Samza task, does each call of process method triggers a call to TaskInstance.commit? 2. Is there a way to buffer these commit activities in memory and flush periodically? Our job is joining >1mm messages per second using a KV store and we have a lot of concern for the changelog size, as in the worst case, the change log will grow as fast as the input log. Chen -- Chen Song