[ https://issues.apache.org/jira/browse/FLINK-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328840#comment-15328840 ]
Tzu-Li (Gordon) Tai edited comment on FLINK-4069 at 6/14/16 2:50 AM: --------------------------------------------------------------------- Thanks for creating this JIRA Shannon. However, there's already a previously opened JIRA on this problem: https://issues.apache.org/jira/browse/FLINK-4023. Would you be ok with tracking this issue on FLINK-4023 and close this as a duplicate issue? I've referenced a link to this JIRA on FLINK-4023. was (Author: tzulitai): Thanks for creating this JIRA Shannon. However, there's already a previously opened JIRA on this problem: https://issues.apache.org/jira/browse/FLINK-4023. Let's track this issue on FLINK-4023 and close this as a duplicate issue :) I've referenced a link to this JIRA on FLINK-4023. > Kafka Consumer should not initialize on construction > ---------------------------------------------------- > > Key: FLINK-4069 > URL: https://issues.apache.org/jira/browse/FLINK-4069 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector > Affects Versions: 1.0.3 > Reporter: Shannon Carey > > The Kafka Consumer connector currently interacts over the network with Kafka > in order to get partition metadata when the class is constructed. Instead, it > should do that work when the job actually begins to run (for example, in > AbstractRichFunction#open() of FlinkKafkaConsumer0?). > The main weakness of broker querying in the constructor is that if there are > network problems, Flink might take a long time (eg. ~1hr) inside the > user-supplied main() method while it attempts to contact each broker and > perform retries. In general, setting up the Kafka partitions does not seem > strictly necessary as part of execution of main() in order to set up the job > plan/topology. > However, as Robert Metzger mentions, there are important concerns with how > Kafka partitions are handled: > "The main reason why we do the querying centrally is: > a) avoid overloading the brokers > b) send the same list of partitions (in the same order) to all parallel > consumers to do a fixed partition assignments (also across restarts). When we > do the querying in the open() method, we need to make sure that all > partitions are assigned, without duplicates (also after restarts in case of > failures)." > See also the mailing list discussion: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/API-request-to-submit-job-takes-over-1hr-td7319.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)