[ https://issues.apache.org/jira/browse/KAFKA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Roesler resolved KAFKA-10091. ---------------------------------- Resolution: Fixed > Improve task idling > ------------------- > > Key: KAFKA-10091 > URL: https://issues.apache.org/jira/browse/KAFKA-10091 > Project: Kafka > Issue Type: Task > Components: streams > Reporter: John Roesler > Assignee: John Roesler > Priority: Blocker > Labels: needs-kip > Fix For: 3.0.0 > > > When Streams is processing a task with multiple inputs, each time it is ready > to process a record, it has to choose which input to process next. It always > takes from the input for which the next record has the least timestamp. The > result of this is that Streams processes data in timestamp order. However, if > the buffer for one of the inputs is empty, Streams doesn't know what > timestamp the next record for that input will be. > Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address > this issue. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization] > The config allows Streams to wait some amount of time for data to arrive on > the empty input, so that it can make a timestamp-ordered decision about which > input to pull from next. > However, this config can be hard to use reliably and efficiently, since what > we're really waiting for is the next poll that _would_ return data from the > empty input's partition, and this guarantee is a function of the poll > interval, the max poll interval, and the internal logic that governs when > Streams will poll again. > The ideal case is you'd be able to guarantee at a minimum that _any_ amount > of idling would guarantee you poll data from the empty partition if there's > data to fetch. -- This message was sent by Atlassian Jira (v8.3.4#803005)