[ https://issues.apache.org/jira/browse/BEAM-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549002#comment-17549002 ]
Danny McCormick commented on BEAM-11366: ---------------------------------------- This issue has been migrated to https://github.com/apache/beam/issues/20612 > Dataflow and KafkaIO or KinesisIO has erratic watermark progress due to > ReaderCache timeouts > -------------------------------------------------------------------------------------------- > > Key: BEAM-11366 > URL: https://issues.apache.org/jira/browse/BEAM-11366 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Sam Whittle > Priority: P3 > > The ReaderCache has a default timeout of 1 minute. If the source is not > polled within that time, the UnboundedReader is destroyed and will be > reinitilized from the checkpoint when next polled. This is too short for runs > where the source is not polled for a while due to other keys to process. In > particular this seems to affect Streaming engine pipelines with backlog. > From an initial look, recovery from the Kafka UnboundedReader checkpoint does > not seem to use the previous watermark for initializing the timestamp policy > and perhaps the timestamp policy will return unknown watermark if it has not > yet observed a record. > So possible fixes would be: > - make cache timeout configurable, or maybe dynamic, and increase it > - ensure that KafkaIO and KinesisIO recovery from checkpoints initializes the > watermark properly so that cache eviction does not matter for watermark > smoothness > - ensure that cache eviction happens in the background instead of blocking > threads acquiring readers which might not expect blocking -- This message was sent by Atlassian Jira (v8.20.7#820007)