[ https://issues.apache.org/jira/browse/BEAM-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014439#comment-16014439 ]
Raghu Angadi commented on BEAM-2185: ------------------------------------ Something like {{withEndTime()}} is required, which sets the upper limit. We need a good way to set starting point. https://github.com/apache/beam/pull/3044 adds {{withStartTime()}. We have all the pieces for someone to work on this. {{withMaxTime()}} : This might be same as {{withEndTime()}}, If it is walltime, it would be problematic with task retries. {{withMaxRecords()}} : This could be max records per partition, we also need {{withStartOffset()}}, also requires roughly uniform partitions. > KafkaIO bounded source > ---------------------- > > Key: BEAM-2185 > URL: https://issues.apache.org/jira/browse/BEAM-2185 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions > Reporter: Raghu Angadi > > KafkaIO could be a useful source for batch applications as well. It could > implement a bounded source. The primary question is how the bounds are > specified. > One option : Source specifies a time period (say 9am-10am), and KafkaIO > fetches appropriate start and end offsets based on time-index in Kafka. This > would suite many batch applications that are launched on a scheduled. > Another option is to always read till the end and commit the offsets to > Kafka. Handling failures and multiple runs of a task might be complicated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)