[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128687#comment-17128687 ] josson paul kalapparambath commented on BEAM-9977: -- [~boyuanz] . Looks like I don't need BEAM-10123. But I need few more clarifiations. Let us say the pipeline failed and restarted. 1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it further down. Now, does ReadFromKafkaViaSDF get each partition and start from where it check pointed last?. > Build Kafka Read on top of Java SplittableDoFn > -- > > Key: BEAM-9977 > URL: https://issues.apache.org/jira/browse/BEAM-9977 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: P2 > Time Spent: 10h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128533#comment-17128533 ] Boyuan Zhang commented on BEAM-9977: Hi Josson, BEAM-10123 is not required for checkpoint. In my implementation, checkpoint happens when the kafka consumer polls empty result, or the runner issues a checkpoint request(which is supported by dataflow runner_v2). Would you like to share more details about why you need BEAM-10123 in your application? Let's also discuss about your questions here : ) >From BEAM-727, it seems like you are going to create a growthFn, which takes >topics as input and emits new TopicPartition. So the pipeline looks like: ``` java PCollection> output = pipeline .apply(Create.of(topic1, topic2)) .apply(your grow transform) .apply(ReadFromKafkaViaSDF); ``` 1) How do I make sure that the 'growthFn' emits only the new Kafka partitions. It happens in KafkaNewPartitonPolFn class The WatchGrowthFn will help you do this: https://github.com/apache/beam/blob/4743e131edadad42555e605be803e26cb37b7ce6/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L864 2) In the KafkaPartitionConsumer class, do I have to take care of Checkpointing the Kafka offset (so that, in case of pipeline faliure, the Kafka data will be read from where it stoped) BEAM-10123 should take care of this case. 3) If Kafka partitions matches with parallelism of the Pipeline, how do I make sure that KafkaPartitionConsumer instance is distributed across parallel instances (Does it taken care automatically by the Runner. We are using Flink Runner.) The distribution will be done automatically by the runner. > Build Kafka Read on top of Java SplittableDoFn > -- > > Key: BEAM-9977 > URL: https://issues.apache.org/jira/browse/BEAM-9977 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: P2 > Time Spent: 10h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128479#comment-17128479 ] josson paul kalapparambath commented on BEAM-9977: -- [~boyuanz] To try this out, do I require https://issues.apache.org/jira/browse/BEAM-10123 to be completed?. I need check point functionality. > Build Kafka Read on top of Java SplittableDoFn > -- > > Key: BEAM-9977 > URL: https://issues.apache.org/jira/browse/BEAM-9977 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: P2 > Time Spent: 10h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116864#comment-17116864 ] Boyuan Zhang commented on BEAM-9977: Hi Alexey, Yes, the SDF Kafka read will use the GrowableOffsetRangeTracker as RestrictionTracker. > Build Kafka Read on top of Java SplittableDoFn > -- > > Key: BEAM-9977 > URL: https://issues.apache.org/jira/browse/BEAM-9977 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: P2 > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn
[ https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116859#comment-17116859 ] Alexey Romanenko commented on BEAM-9977: [~boyuanz] I see that there is a PR #11715, that was merged recently, but it doesn't affect KafkaIO directly (as I understand). Do you expect other PRs for this Jira? > Build Kafka Read on top of Java SplittableDoFn > -- > > Key: BEAM-9977 > URL: https://issues.apache.org/jira/browse/BEAM-9977 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: P2 > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)