[ https://issues.apache.org/jira/browse/BEAM-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767411#comment-16767411 ]
Raghu Angadi commented on BEAM-2185: ------------------------------------ Yes. The whole batch use case should document and/or log a big caveat listing all these concerns. When a user sets `enable.auto.commit = true`, the user is essentially introducing parallel checkpoint-like functionality outside of Apache Beam control. I think as with 'commitOffsetsInFinalize()' it can help with resuming from a reasonable point on restart, but does not guarantee exactly-once (in fact only 'update' guarantees exact-once in Beam, no restart of a pipeline does). > KafkaIO bounded source > ---------------------- > > Key: BEAM-2185 > URL: https://issues.apache.org/jira/browse/BEAM-2185 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka > Reporter: Raghu Angadi > Priority: Major > > KafkaIO could be a useful source for batch applications as well. It could > implement a bounded source. The primary question is how the bounds are > specified. > One option : Source specifies a time period (say 9am-10am), and KafkaIO > fetches appropriate start and end offsets based on time-index in Kafka. This > would suite many batch applications that are launched on a scheduled. > Another option is to always read till the end and commit the offsets to > Kafka. Handling failures and multiple runs of a task might be complicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)