[ 
https://issues.apache.org/jira/browse/BEAM-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated BEAM-12406:
-----------------------------
    Status: Triage Needed  (was: Open)

> Progressing watermark for not available Kinesis stream
> ------------------------------------------------------
>
>                 Key: BEAM-12406
>                 URL: https://issues.apache.org/jira/browse/BEAM-12406
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kinesis
>    Affects Versions: 2.27.0
>            Reporter: Mateusz Rataj
>            Priority: P3
>
> We use Dataflow with Apache Beam to read events from Kinesis streams. 
> Recently, we've spotted that in a case when one of the streams was not 
> available in the middle of events processing (due to removal or problem with 
> the credentials), the data watermark for this stream was still being updated. 
>  
> Imagine scenario:
>  # Permissions allow to read from stream A
>  # Data is read from stream A
>  # Permissions are changed and don’t allow to read from stream A
>  # Watermark for stream A is progressing (but stream data is not read due to 
> permissions issue)
>  # Permissions are fixed to read stream A
>  # Data is read from stream A but from the updated watermark
> As a result, stream data between steps 3-5 is lost and the client doesn’t 
> know that.
> Additionally, it may be confusing from the Dataflow console perspective, as 
> it suggests that events are still being read from the stream. It is hard to 
> rely on the watermark as a source metric for alerting purposes as well.
> Brief investigation suggests that maybe the _KinesisReader.getWatermark()_ 
> logic doesn’t consider the state of the stream i.e. is it available or not, 
> and it treats the removed stream as a stream without traffic. Watermark 
> calculation should be adjusted to take that information into account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to