[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task
[ https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500 ] zlzhang0122 edited comment on FLINK-27608 at 5/16/22 12:01 PM: --- [~Thesharing] Thanks for your detailed reply! I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload [^exception.txt] to describe it. Correct me if I'm wrong. Thanks! was (Author: zlzhang0122): [~Thesharing] Thanks for your detailed reply! I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload [^exception.txt] about that to describe it.Correct me if I'm wrong. Thanks! > Flink may throw PartitionNotFound Exception if the downstream task reached > Running state earlier than it's upstream task > > > Key: FLINK-27608 > URL: https://issues.apache.org/jira/browse/FLINK-27608 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.14.2 >Reporter: zlzhang0122 >Priority: Major > Attachments: exception.txt > > > Flink streaming job deployment may throw PartitionNotFound Exception if the > downstream task reached Running state earlier than its upstream task and > after maximum backoff for partition requests passed.But the config of > taskmanager.network.request-backoff.max is not eay to decide. Can we use a > loop awaiting the upstream task partition be ready? > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task
[ https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500 ] zlzhang0122 edited comment on FLINK-27608 at 5/16/22 11:59 AM: --- [~Thesharing] Thanks for your detailed reply! I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload [^exception.txt] about that to describe it.Correct me if I'm wrong. Thanks! was (Author: zlzhang0122): [~Thesharing] Thanks for your detailed reply. I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload [^exception.txt] about that to describe it.Correct me if I'm wrong? > Flink may throw PartitionNotFound Exception if the downstream task reached > Running state earlier than it's upstream task > > > Key: FLINK-27608 > URL: https://issues.apache.org/jira/browse/FLINK-27608 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.14.2 >Reporter: zlzhang0122 >Priority: Major > Attachments: exception.txt > > > Flink streaming job deployment may throw PartitionNotFound Exception if the > downstream task reached Running state earlier than its upstream task and > after maximum backoff for partition requests passed.But the config of > taskmanager.network.request-backoff.max is not eay to decide. Can we use a > loop awaiting the upstream task partition be ready? > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task
[ https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500 ] zlzhang0122 edited comment on FLINK-27608 at 5/16/22 11:58 AM: --- [~Thesharing] Thanks for your detailed reply. I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload [^exception.txt] about that to describe it.Correct me if I'm wrong? was (Author: zlzhang0122): [~Thesharing] Thanks for your detailed reply. I think the scenario you have mentioned is very useful and is one of the scenarios. The case I have met is another scenario, in that case, the akka message maybe miss or timeout, and I have upload a log about that to describe it.Correct me if I'm wrong? > Flink may throw PartitionNotFound Exception if the downstream task reached > Running state earlier than it's upstream task > > > Key: FLINK-27608 > URL: https://issues.apache.org/jira/browse/FLINK-27608 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.14.2 >Reporter: zlzhang0122 >Priority: Major > Attachments: exception.txt > > > Flink streaming job deployment may throw PartitionNotFound Exception if the > downstream task reached Running state earlier than its upstream task and > after maximum backoff for partition requests passed.But the config of > taskmanager.network.request-backoff.max is not eay to decide. Can we use a > loop awaiting the upstream task partition be ready? > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task
[ https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536840#comment-17536840 ] Zhilong Hong edited comment on FLINK-27608 at 5/13/22 6:55 PM: --- When a PartitionNotFoundException is thrown in the scenario you mentioned above, it will be handled in by the logic located at {{org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298}}. The task will try to {{requestPartitionProducerState}} from the JobManager. If the upstream task is not ready (for example, in the DEPLOYING or INITIALIZING state), the SingleInputGate will try to retrigger another partition request until the partition is consumable. was (Author: thesharing): As a PartitionNotFoundException is thrown, it will be handled in by the logic located at {{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}. The task will try to {{requestPartitionProducerState}} from the JobManager. If the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), the SingleInputGate will try to retrigger another partition request until the partition is consumable. > Flink may throw PartitionNotFound Exception if the downstream task reached > Running state earlier than it's upstream task > > > Key: FLINK-27608 > URL: https://issues.apache.org/jira/browse/FLINK-27608 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.14.2 >Reporter: zlzhang0122 >Priority: Major > Fix For: 1.16.0 > > > Flink streaming job deployment may throw PartitionNotFound Exception if the > downstream task reached Running state earlier than its upstream task and > after maximum backoff for partition requests passed.But the config of > taskmanager.network.request-backoff.max is not eay to decide. Can we use a > loop awaiting the upstream task partition be ready? > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task
[ https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536840#comment-17536840 ] Zhilong Hong edited comment on FLINK-27608 at 5/13/22 6:52 PM: --- As a PartitionNotFoundException is thrown, it will be handled in by the logic located at {{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}. The task will try to {{requestPartitionProducerState}} from the JobManager. If the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), the SingleInputGate will try to retrigger another partition request until the partition is consumable. was (Author: thesharing): As a {{PartitionNotFoundException}} is thrown, it will be handled in by the logic located at {{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}. The {{Task}} will try to {{requestPartitionProducerState}} from the JobManager. If the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), the {{SingleInputGate}} will try to retrigger another partition request until the partition is consumable. > Flink may throw PartitionNotFound Exception if the downstream task reached > Running state earlier than it's upstream task > > > Key: FLINK-27608 > URL: https://issues.apache.org/jira/browse/FLINK-27608 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.14.2 >Reporter: zlzhang0122 >Priority: Major > Fix For: 1.16.0 > > > Flink streaming job deployment may throw PartitionNotFound Exception if the > downstream task reached Running state earlier than its upstream task and > after maximum backoff for partition requests passed.But the config of > taskmanager.network.request-backoff.max is not eay to decide. Can we use a > loop awaiting the upstream task partition be ready? > -- This message was sent by Atlassian Jira (v8.20.7#820007)