[
https://issues.apache.org/jira/browse/FLINK-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542348#comment-15542348
]
ASF GitHub Bot commented on FLINK-4723:
---------------------------------------
GitHub user tzulitai opened a pull request:
https://github.com/apache/flink/pull/2580
[FLINK-4723] [kafka-connector] Unify committed offsets to Kafka to be the
next record to process
The description within the JIRA ticket
([FLINK-4723](https://issues.apache.org/jira/browse/FLINK-4723)) explains the
reasoning for this change.
With this change, offsets committed to Kafka are larger by 1 compared to
the internally checkpointed offsets. This is changed at the
`FlinkKafkaConsumerBase` level, so that offsets given through the abstract
`commitSpecificOffsetsToKafka()` method to the version-specific implementations
are already incremented and represent the next record to process. This way, the
version-specific implementations simply commit the given offsets without the
need to manipulate them.
This PR also includes major refactoring of the IT tests to add commit
offset related IT tests to `FlinkKafkaConsumerTestBase`, and let both the 0.8
and 0.9 consumers run offset committing / initial offset startup tests
(previously only the 0.8 consumer had these tests).
R: @rmetzger what's your take on this?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tzulitai/flink FLINK-4723
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2580.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2580
----
commit cc782ffd4c174f23c45349771b318a08a2be75a3
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date: 2016-10-02T08:54:57Z
[FLINK-4723] [kafka-connector] Unify committed offsets to Kafka to be next
record to process
----
> Unify behaviour of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9
> consumer
> ---------------------------------------------------------------------------------
>
> Key: FLINK-4723
> URL: https://issues.apache.org/jira/browse/FLINK-4723
> Project: Flink
> Issue Type: Improvement
> Components: Kafka Connector
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.2.0, 1.1.3
>
>
> The proper "behaviour" of the offsets committed back to Kafka / ZK should be
> "the next offset that consumers should read (in Kafka terms, the 'position')".
> This is already fixed for the 0.9 consumer by FLINK-4618, by incrementing the
> committed offsets back to Kafka by the 0.9 by 1, so that the internal
> {{KafkaConsumer}} picks up the correct start position when committed offsets
> are present. This fix was required because the start position from committed
> offsets was implicitly determined with Kafka 0.9 APIs.
> However, since the 0.8 consumer handles offset committing and start position
> using Flink's own {{ZookeeperOffsetHandler}} and not Kafka's high-level APIs,
> the 0.8 consumer did not require a fix.
> I propose to still unify the behaviour of committed offsets across 0.8 and
> 0.9 to the definition above.
> Otherwise, if users in any case first uses the 0.8 consumer to read data and
> have Flink-committed offsets in ZK, and then uses a high-level 0.8 Kafka
> consumer to read the same topic in a non-Flink application, the first record
> will be duplicate (because, like described above, Kafka high-level consumers
> expect the committed offsets to be "the next record to process" and not "the
> last processed record").
> This requires incrementing the committed ZK offsets in 0.8 to also be
> incremented by 1, and changing how Flink internal offsets are initialized
> with accordance to the acquired ZK offsets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)