Hi Sriram,

A short answer: the interval of polling is adjusted “dynamically” (by blocking 
the KafkaConsumer#poll call) according to the traffic of data.

I think this line [1] is what you are looking for.

Basically KafkaSource fires KafkaPartitionSplitReader.fetch calls repeatedly in 
a loop, and each call invokes KafkaConsumer.poll(). If the data traffic is 
quite high on Kafka, the poll request will be returned immediately with new 
data, so the interval of polls approximately equals to the latency of poll 
request. If the traffic is relatively low, the poll request will be blocked 
until new data is available on Kafka or the request times out (which is set to 
10 seconds in KafkaSource).

Hope this could be helpful!

[1] 
https://github.com/apache/flink/blob/d1cb177c91b41a5387814ad60d1799c08caf3ad9/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L101

Best,
Qingsheng
On Oct 8, 2022, 02:44 +0800, Sriram Ganesh <[email protected]>, wrote:
> Hi Everyone,
>
> I am trying to understand how Flink works in realtime with Kafka. Since
> Kafka works on polling, what will be the minimal time for Flink to poll
> Kafka?.
>
> Any explanation or documentation will be helpful.
>
> Thanks,
> Sriram G

Reply via email to