[ https://issues.apache.org/jira/browse/FLINK-32019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hongshun Wang closed FLINK-32019. --------------------------------- > EARLIEST offset strategy for partitions discoveried later based on FLIP-288 > --------------------------------------------------------------------------- > > Key: FLINK-32019 > URL: https://issues.apache.org/jira/browse/FLINK-32019 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka > Affects Versions: kafka-3.0.0 > Reporter: Hongshun Wang > Assignee: Hongshun Wang > Priority: Major > Labels: pull-request-available > Fix For: kafka-4.0.0 > > > As described in > [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source], > the strategy used for new partitions is the same as the initial offset > strategy, which is not reasonable. > According to the semantics, if the startup strategy is latest, the consumed > data should include all data from the moment of startup, which also includes > all messages from new created partitions. However, the latest strategy > currently maybe used for new partitions, leading to the loss of some data > (thinking a new partition is created and might be discovered by Kafka source > several minutes later, and the message produced into the partition within the > gap might be dropped if we use for example "latest" as the initial offset > strategy).if the data from all new partitions is not read, it does not meet > the user's expectations. > Other ploblems see final Section of > [FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]: > {{User specifies OffsetsInitializer for new partition}} . > Therefore, it’s better to provide an *EARLIEST* strategy for later discovered > partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)