[ 
https://issues.apache.org/jira/browse/FLINK-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544170#comment-15544170
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-4722:
--------------------------------------------

One last comment:
Iterating over all data shouldn't be required. Say you have 3 parallelism (3 
subtasks) for the FlinkKafkaConsumer. Each subtask will be assigned a single 
partition and read only that partition. Using a {{KeyedDeserializationSchema}}, 
you can key each data record with the partition id "within the source". That 
means you won't need another map function to do this keying. The data that 
comes out of FlinkKafkaConsumer will be (Tuple(partition, matrixData))

The only "iteration" would probably be a "keyBy" operation after the source. I 
think, in use cases like yours, you would benefit if the returned stream from 
the source is already a {{KeyedStream}}, to take advantage of pre-partitioned 
data outside of Flink. I'll think a bit about this idea!

In any way, writing your own custom consumer also works for now :)

> Consumer group concept not working properly with FlinkKafkaConsumer09  
> -----------------------------------------------------------------------
>
>                 Key: FLINK-4722
>                 URL: https://issues.apache.org/jira/browse/FLINK-4722
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.1.2
>            Reporter: Sudhanshu Sekhar Lenka
>
> When Kafka one Topic has 3 partition and 3 FlinkKafkaConsumer09 connected to 
> that same topic using "group.id" ,"myGroup" property . Still flink consumer 
> get all data which are push to each 3   partition . While it work properly 
> with normal java consumer. each consumer get specific data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to