[
https://issues.apache.org/jira/browse/FLINK-37417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019372#comment-18019372
]
Lorenzo Nicora commented on FLINK-37417:
----------------------------------------
[~hong] this should have component = `Connectors/AWS`
> Implement better exception message to explain why KinesisSource doesn't
> support partial recovery
> ------------------------------------------------------------------------------------------------
>
> Key: FLINK-37417
> URL: https://issues.apache.org/jira/browse/FLINK-37417
> Project: Flink
> Issue Type: Improvement
> Reporter: Hong Liang Teoh
> Priority: Major
>
> The KinesisSource doesn't currently support partial recovery because we want
> to prevent duplicate records being read into KinesisSource when restart +
> partial failure happens.
>
> Partial recovery is when Flink restarts a subset of subtasks to minimize
> downtime during a job failure. Flink determines which subtasks need to be
> restarted by checking all connected subtasks to the failed subtasks.
> However, this algorithm doesn't work for a KDS source, because the
> KinesisSource:
> # Has parent-child shard ordering within the KDS stream.
> # Does not ensure that all child shards are assigned to the same subtask (to
> prevent skew)
> These two mean that when we restart "selected" subtasks, we cannot ensure
> that the child shards have not been assigned to other subtasks "not
> connected" to the selected subtask.
>
> This is very much an edge case, where users will have to be doing the
> following:
> # NOT have a keyBy immediately after the KDS source.
> # Partial failure happens on an operator BEFORE the first keyBy in the job
> graph.
>
> This JIRA is to enrich the Exception message thrown
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)