[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

2022-05-16 Thread zlzhang0122 (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500
 ] 

zlzhang0122 edited comment on FLINK-27608 at 5/16/22 12:01 PM:
---

[~Thesharing] Thanks for your detailed reply! I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload [^exception.txt]  to describe it. Correct me if I'm wrong. Thanks!


was (Author: zlzhang0122):
[~Thesharing] Thanks for your detailed reply! I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload [^exception.txt] about that to describe it.Correct me if I'm wrong. 
Thanks!

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> 
>
> Key: FLINK-27608
> URL: https://issues.apache.org/jira/browse/FLINK-27608
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.2
>Reporter: zlzhang0122
>Priority: Major
> Attachments: exception.txt
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

2022-05-16 Thread zlzhang0122 (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500
 ] 

zlzhang0122 edited comment on FLINK-27608 at 5/16/22 11:59 AM:
---

[~Thesharing] Thanks for your detailed reply! I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload [^exception.txt] about that to describe it.Correct me if I'm wrong. 
Thanks!


was (Author: zlzhang0122):
[~Thesharing] Thanks for your detailed reply. I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload [^exception.txt] about that to describe it.Correct me if I'm wrong?

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> 
>
> Key: FLINK-27608
> URL: https://issues.apache.org/jira/browse/FLINK-27608
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.2
>Reporter: zlzhang0122
>Priority: Major
> Attachments: exception.txt
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

2022-05-16 Thread zlzhang0122 (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537500#comment-17537500
 ] 

zlzhang0122 edited comment on FLINK-27608 at 5/16/22 11:58 AM:
---

[~Thesharing] Thanks for your detailed reply. I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload [^exception.txt] about that to describe it.Correct me if I'm wrong?


was (Author: zlzhang0122):
[~Thesharing] Thanks for your detailed reply. I think the scenario you have 
mentioned is very useful and is one of the scenarios. The case I have met is 
another scenario, in that case, the akka message maybe miss or timeout, and I 
have upload a log about that to describe it.Correct me if I'm wrong?

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> 
>
> Key: FLINK-27608
> URL: https://issues.apache.org/jira/browse/FLINK-27608
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.2
>Reporter: zlzhang0122
>Priority: Major
> Attachments: exception.txt
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

2022-05-13 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536840#comment-17536840
 ] 

Zhilong Hong edited comment on FLINK-27608 at 5/13/22 6:55 PM:
---

When a PartitionNotFoundException is thrown in the scenario you mentioned 
above, it will be handled in by the logic located at 
{{org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298}}.

The task will try to {{requestPartitionProducerState}} from the JobManager. If 
the upstream task is not ready (for example, in the DEPLOYING or INITIALIZING 
state), the SingleInputGate will try to retrigger another partition request 
until the partition is consumable.


was (Author: thesharing):
As a PartitionNotFoundException is thrown, it will be handled in by the logic 
located at 
{{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}.
 The task will try to {{requestPartitionProducerState}} from the JobManager. If 
the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), 
the SingleInputGate will try to retrigger another partition request until the 
partition is consumable.

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> 
>
> Key: FLINK-27608
> URL: https://issues.apache.org/jira/browse/FLINK-27608
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.2
>Reporter: zlzhang0122
>Priority: Major
> Fix For: 1.16.0
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27608) Flink may throw PartitionNotFound Exception if the downstream task reached Running state earlier than it's upstream task

2022-05-13 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536840#comment-17536840
 ] 

Zhilong Hong edited comment on FLINK-27608 at 5/13/22 6:52 PM:
---

As a PartitionNotFoundException is thrown, it will be handled in by the logic 
located at 
{{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}.
 The task will try to {{requestPartitionProducerState}} from the JobManager. If 
the upstream task is not ready (i.e. in the DEPLOYING or INITIALIZING state), 
the SingleInputGate will try to retrigger another partition request until the 
partition is consumable.


was (Author: thesharing):
As a {{PartitionNotFoundException}} is thrown, it will be handled in by the 
logic located at 
{{{}org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler:298{}}}.
 The {{Task}} will try to {{requestPartitionProducerState}} from the 
JobManager. If the upstream task is not ready (i.e. in the DEPLOYING or 
INITIALIZING state), the {{SingleInputGate}} will  try to retrigger another 
partition request until the partition is consumable.

> Flink may throw PartitionNotFound Exception if the downstream task reached 
> Running state earlier than it's upstream task
> 
>
> Key: FLINK-27608
> URL: https://issues.apache.org/jira/browse/FLINK-27608
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.14.2
>Reporter: zlzhang0122
>Priority: Major
> Fix For: 1.16.0
>
>
> Flink streaming job deployment may throw PartitionNotFound Exception if the 
> downstream task reached Running state earlier than its upstream task and 
> after maximum backoff for partition requests passed.But the config of 
> taskmanager.network.request-backoff.max is not eay to decide. Can we use a 
> loop awaiting the upstream task partition be ready?
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)