[ 
https://issues.apache.org/jira/browse/KAFKA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699134#comment-15699134
 ] 

Ewen Cheslack-Postava commented on KAFKA-4007:
----------------------------------------------

[~enothereska] prefetching is based on the fetch requests, not any setting of 
max.poll.records. A new fetch request is only sent if the previous data is 
exhausted. If a user sets max.poll.records = 1, then a new request only gets 
sent when the data from the last request is completely exhausted. Since 
processing a single record is probably very fast, this isn't efficient -- we'd 
probably prefer to fetch data earlier since the network roundtrip (especially 
given the fetch.min.bytes means you could easily spend some time waiting on the 
broker/producers) may be relatively expensive.

The idea here is to send another fetch but delay processing until the previous 
fetch response has been fully processed. This pipelines data such that we could 
potentially have 2x the response data queued, but doesn't add any more 
overhead, still gives a chance for pipelining future data (given the extra 
buffer on the second batches of results), and doesn't defer fetching more data 
until the last minute (which max.poll.records would otherwise allow since it 
would allow for a single record to block requesting additional to satisfy 
subsequent poll() requests).

> Improve fetch pipelining for low values of max.poll.records
> -----------------------------------------------------------
>
>                 Key: KAFKA-4007
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4007
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Jason Gustafson
>            Assignee: Mickael Maison
>
> Currently the consumer will only send a prefetch for a partition after all 
> the records from the previous fetch have been consumed. This can lead to 
> suboptimal pipelining when max.poll.records is set very low since the 
> processing latency for a small set of records may be small compared to the 
> latency of a fetch. An improvement suggested by [~junrao] is to send the 
> fetch anyway even if we have unprocessed data buffered, but delay reading it 
> from the socket until that data has been consumed. Potentially the consumer 
> can delay reading _any_ pending fetch until it is ready to be returned to the 
> user, which may help control memory better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to