[
https://issues.apache.org/jira/browse/KAFKA-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lianet Magrans updated KAFKA-20276:
-----------------------------------
Component/s: clients
> API to read data between offsets
> --------------------------------
>
> Key: KAFKA-20276
> URL: https://issues.apache.org/jira/browse/KAFKA-20276
> Project: Kafka
> Issue Type: Improvement
> Components: clients, consumer
> Affects Versions: 4.1.1
> Reporter: Werner Daehn
> Priority: Major
>
> For introspecting a topic/partition it is quite common to
> * Read the last 100 messages
> * Read the first 100 messages
> * Read all messages between offset 100 and 200
> This can be done by reading the watermark for the first two use cases or
> position the consumer to the start offset, keep polling the data and stop the
> poll once the end offset has been overshot.
> But beside of it being complicated, it also comes with downsides.
> It is complicated, because an asynchronous polling is needed whereas all the
> user wants to do is reading a "file" from line 100 to 200. An API like `msg[]
> consumer.read(topicpartition, startoffset, endoffset)`.
>
> And it is full of side effects.
> What are the last 100 messages? A compaction might have happened and hence
> the offset list has holes in it, you would need to read offsets 50-200 to get
> the last 100 messages.
> Do you want skip or include tumbstone messages?
> Last poll returned offset 199, the next poll(10) waits for the full 10
> seconds for more data and then will return just the offset 200 message.
> Setting the poll size to 1 record comes with downsides also.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)