[ 
https://issues.apache.org/jira/browse/KAFKA-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lianet Magrans updated KAFKA-20276:
-----------------------------------
    Component/s: clients

> API to read data between offsets
> --------------------------------
>
>                 Key: KAFKA-20276
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20276
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, consumer
>    Affects Versions: 4.1.1
>            Reporter: Werner Daehn
>            Priority: Major
>
> For introspecting a topic/partition it is quite common to
>  * Read the last 100 messages
>  * Read the first 100 messages
>  * Read all messages between offset 100 and 200
> This can be done by reading the watermark for the first two use cases or 
> position the consumer to the start offset, keep polling the data and stop the 
> poll once the end offset has been overshot.
> But beside of it being complicated, it also comes with downsides.
> It is complicated, because an asynchronous polling is needed whereas all the 
> user wants to do is reading a "file" from line 100 to 200. An API like `msg[] 
> consumer.read(topicpartition, startoffset, endoffset)`.
>  
> And it is full of side effects.
> What are the last 100 messages? A compaction might have happened and hence 
> the offset list has holes in it, you would need to read offsets 50-200 to get 
> the last 100 messages.
> Do you want skip or include tumbstone messages?
> Last poll returned offset 199, the next poll(10) waits for the full 10 
> seconds for more data and then will return just the offset 200 message. 
> Setting the poll size to 1 record comes with downsides also.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to