[jira] [Issue Comment Deleted] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

josson paul kalapparambath updated BEAM-9977:
-
Comment: was deleted

(was: [~boyuanz] . Looks like I don't need BEAM-10123.   But I need few more 
clarifications.

Let us say the pipeline failed and restarted.

1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it 
further down. Now,  does ReadFromKafkaViaSDF get each partition and start from 
where it check pointed last?.

 )

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128687#comment-17128687
 ] 

josson paul kalapparambath edited comment on BEAM-9977 at 6/8/20, 10:21 PM:


[~boyuanz] . Looks like I don't need BEAM-10123.   But I need few more 
clarifications.

Let us say the pipeline failed and restarted.

1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it 
further down. Now,  does ReadFromKafkaViaSDF get each partition and start from 
where it check pointed last?.

 


was (Author: josson):
[~boyuanz] . Looks like I don't need BEAM-10123.   But I need few more 
clarifiations.

Let us say the pipeline failed and restarted.

1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it 
further down. Now,  does ReadFromKafkaViaSDF get each partition and start from 
where it check pointed last?.

 

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128687#comment-17128687
 ] 

josson paul kalapparambath commented on BEAM-9977:
--

[~boyuanz] . Looks like I don't need BEAM-10123.   But I need few more 
clarifiations.

Let us say the pipeline failed and restarted.

1) WatchGrothFn will fetch *ALL* the Kafka partitions again and sends it 
further down. Now,  does ReadFromKafkaViaSDF get each partition and start from 
where it check pointed last?.

 

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-08 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128479#comment-17128479
 ] 

josson paul kalapparambath commented on BEAM-9977:
--

[~boyuanz] 

To try this out, do I require https://issues.apache.org/jira/browse/BEAM-10123 
to be completed?. I need check point functionality. 

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-727) KafkaIO should support dynamic addition of Kafka partitions to assigned topics.

2020-06-07 Thread josson paul kalapparambath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127859#comment-17127859
 ] 

josson paul kalapparambath commented on BEAM-727:
-

[~iemejia] [~jkff]

I am planning to come up with a KafakIO to handle dynamic addition of Kafka 
partitions. 

Below is the pseudo code
{code:java}
class ReadAll {

public PCollection> expand(PBegin input) {
Watch.Growth growthFn = 
Watch.growthOf(Contextful.of(new KafkaNewPartitonPolFn(), Requirements.empty()),
new 
ExtractOnlyPartiton())).withPollInterval(getPollingInterval());

PCollection> = input
.apply("convert to read request", this)
.apply(" WatchForNewPartitions", growthFn)
.apply(Values.create())
.apply("Kafka Partiton consumer", new KafkaPartitionConsumer())

}

private static class KafkaPartitionConsumer extends DoFn {
//Kafka reader

}

}{code}
 

I have few questions here.
 1) How do I make sure that the 'growthFn' emits only the new Kafka partitions. 
It happens in KafkaNewPartitonPolFn class
 2) In the KafkaPartitionConsumer class, do I have to take care of 
Checkpointing the Kafka offset (so that, in case of pipeline faliure, the Kafka 
data will be read from where it stoped)
 3) If Kafka partitions matches with parallelism of the Pipeline, how do I make 
sure that KafkaPartitionConsumer instance is distributed across parallel 
instances (Does it taken care automatically by the Runner. We are using Flink 
Runner.)

 

 

 

> KafkaIO should support dynamic addition of Kafka partitions to assigned 
> topics. 
> 
>
> Key: BEAM-727
> URL: https://issues.apache.org/jira/browse/BEAM-727
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Amit Sela
>Priority: P3
>
> Kafka topics may add partitions dynamically (doesn't require Kafka to 
> restart, or halt a topic), and the KafkaIO should probably support this.
> *Note:* 
> Consistently assigning partitions should be taken into account, specifically 
> for the case of reading from multiple topics, where one (or more) of the 
> topics added partitions while the pipeline is running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)