[ 
https://issues.apache.org/jira/browse/FLINK-29308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611332#comment-17611332
 ] 

Zhu Zhu edited comment on FLINK-29308 at 9/30/22 3:34 AM:
----------------------------------------------------------

That may be the cause.
If using fine-grained resources, NoResourceAvailableException could happen if 
Flink cannot find a {{matching}} slot for scheduled vertices (in coarse-grained 
resources case, a slot can always match any slot request).


was (Author: zhuzh):
That may be the cause.
If using fine grained resource, NoResourceAvailableException could happen if 
Flink cannot find a {{matching}} slot for scheduled vertices (in coarse grained 
case, a slot can always match any slot request).

> NoResourceAvailableException fails the batch job
> ------------------------------------------------
>
>                 Key: FLINK-29308
>                 URL: https://issues.apache.org/jira/browse/FLINK-29308
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Aitozi
>            Priority: Major
>
> When running batch job configured with the following restart strategy
> {code:java}
> restart-strategy: fixed-delay
> restart-strategy.fixed-delay.delay: 15 s
> restart-strategy.fixed-delay.attempts: 10 {code}
> If the cluster resource is not enough to run the single stage, it can run 
> partial of the stage, but it still will fail after the 10 times 
> {{{}NoResourceAvailableException{}}}. IMO, for batch job the 
> {{NoResourceAvailableException}} do not necessary to trigger the job to fail. 
> Or at least this failure reason is not suitable to share the same restart 
> strategy with other failure reasons



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to