[ https://issues.apache.org/jira/browse/BEAM-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-11366: ----------------------------------- Status: Open (was: Triage Needed) > Dataflow and KafkaIO has erratic watermark progress due to ReaderCache > timeouts > ------------------------------------------------------------------------------- > > Key: BEAM-11366 > URL: https://issues.apache.org/jira/browse/BEAM-11366 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Sam Whittle > Assignee: Sam Whittle > Priority: P2 > > The ReaderCache has a default timeout of 1 minute. If the source is not > polled within that time, the UnboundedReader is destroyed and will be > reinitilized from the checkpoint when next polled. This is too short for runs > where the source is not polled for a while due to other keys to process. In > particular this seems to affect Streaming engine pipelines with backlog. > From an initial look, recovery from the Kafka UnboundedReader checkpoint does > not seem to use the previous watermark for initializing the timestamp policy > and perhaps the timestamp policy will return unknown watermark if it has not > yet observed a record. > So possible fixes would be: > - make cache timeout configurable, or maybe dynamic, and increase it > - ensure that KafkaIO recovery from checkpoints initializes the watermark > properly so that cache eviction does not matter for watermark smoothness -- This message was sent by Atlassian Jira (v8.3.4#803005)