[
https://issues.apache.org/jira/browse/BEAM-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anonymous updated BEAM-12406:
-----------------------------
Status: Triage Needed (was: Open)
> Progressing watermark for not available Kinesis stream
> ------------------------------------------------------
>
> Key: BEAM-12406
> URL: https://issues.apache.org/jira/browse/BEAM-12406
> Project: Beam
> Issue Type: Bug
> Components: io-java-kinesis
> Affects Versions: 2.27.0
> Reporter: Mateusz Rataj
> Priority: P3
>
> We use Dataflow with Apache Beam to read events from Kinesis streams.
> Recently, we've spotted that in a case when one of the streams was not
> available in the middle of events processing (due to removal or problem with
> the credentials), the data watermark for this stream was still being updated.
>
> Imagine scenario:
> # Permissions allow to read from stream A
> # Data is read from stream A
> # Permissions are changed and don’t allow to read from stream A
> # Watermark for stream A is progressing (but stream data is not read due to
> permissions issue)
> # Permissions are fixed to read stream A
> # Data is read from stream A but from the updated watermark
> As a result, stream data between steps 3-5 is lost and the client doesn’t
> know that.
> Additionally, it may be confusing from the Dataflow console perspective, as
> it suggests that events are still being read from the stream. It is hard to
> rely on the watermark as a source metric for alerting purposes as well.
> Brief investigation suggests that maybe the _KinesisReader.getWatermark()_
> logic doesn’t consider the state of the stream i.e. is it available or not,
> and it treats the removed stream as a stream without traffic. Watermark
> calculation should be adjusted to take that information into account.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)