[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

Zhilong Hong (Jira) Fri, 13 May 2022 11:56:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536840#comment-17536840
 ]


Zhilong Hong edited comment on FLINK-27608 at 5/13/22 6:55 PM:
---------------------------------------------------------------

When a PartitionNotFoundException is thrown in the scenario you mentioned 
above, it will be handled in by the logic located at 
{{org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298}}.

The task will try to {{requestPartitionProducerState}} from the JobManager. If 
the upstream task is not ready (for example, in the DEPLOYING or INITIALIZING 
state), the SingleInputGate will try to retrigger another partition request 
until the partition is consumable.


was (Author: thesharing):
As a PartitionNotFoundException is thrown, it will be handled in by the logic 
located at 
{{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}.
 The task will try to {{requestPartitionProducerState}} from the JobManager. If 
the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), 
the SingleInputGate will try to retrigger another partition request until the 
partition is consumable.

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-27608
>                 URL: https://issues.apache.org/jira/browse/FLINK-27608
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.14.2
>            Reporter: zlzhang0122
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

Reply via email to