[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running

2023-04-06 Thread Weihua Hu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709367#comment-17709367
 ] 

Weihua Hu commented on FLINK-31529:
---

[~xtsong] Could you take a look at this

> Let yarn client exit early before JobManager running
> 
>
> Key: FLINK-31529
> URL: https://issues.apache.org/jira/browse/FLINK-31529
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Weihua Hu
>Priority: Major
>
> Currently the YarnClusterDescriptor always wait yarn application status to be 
> RUNNING even if we use the detach mode. 
> In batch mode, the queue resources is insufficient in most case. So the job 
> manager may take a long time to wait resources. And client also keep waiting 
> too. If flink client is killed(some other reason), the cluster will be 
> shutdown too.
> We need an option to let Flink client exit early. Use the detach option or 
> introduce a new option are both OK.
> Looking forward other suggestions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running

2023-03-20 Thread Weihua Hu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702645#comment-17702645
 ] 

Weihua Hu commented on FLINK-31529:
---

Thanks all for reply.

If introducing a new option is acceptable, I would like to take this issue. 

In batch mode, the queue resources is insufficient in most case.

yes.  
In detail, we have two kinds of YARN clusters: Streaming and Batch. The 
Streaming cluster provide guaranteed resources to make all jobs long-running.
The Batch cluster does not guarantee that jobs could get resources Immediately, 
these application will queued in YARN scheduler. This strategy maximizes 
resource utilization. 
In addition, we do have a platform to schedule flink batch jobs. These jobs 
will run at hourly or daily levels. 



> Let yarn client exit early before JobManager running
> 
>
> Key: FLINK-31529
> URL: https://issues.apache.org/jira/browse/FLINK-31529
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Weihua Hu
>Priority: Major
>
> Currently the YarnClusterDescriptor always wait yarn application status to be 
> RUNNING even if we use the detach mode. 
> In batch mode, the queue resources is insufficient in most case. So the job 
> manager may take a long time to wait resources. And client also keep waiting 
> too. If flink client is killed(some other reason), the cluster will be 
> shutdown too.
> We need an option to let Flink client exit early. Use the detach option or 
> introduce a new option are both OK.
> Looking forward other suggestions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running

2023-03-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702634#comment-17702634
 ] 

Biao Geng commented on FLINK-31529:
---

Hi there, I want to share some thoughts/questions about this jira:
1. According to the 
[doc|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/],
 the --detached option is used to notify if the client should wait the job to 
finish. I have seem some users' platform rely on this option to get the 
returned YARN app info to manage their flink jobs(e.g. whether the job is 
submitted successfully). Maybe introducing a new option is better than changing 
the behavior of the --detached option.
2. The description says "In batch mode, the queue resources is insufficient in 
most case." IIUC, the lack of resource should not be a normal case. One 
possible use case I can come up with is that to reduce costs, people may run 
flink batch jobs in night and utilize workflow frameworks like airflow to retry 
the submission. Is that the case? 

> Let yarn client exit early before JobManager running
> 
>
> Key: FLINK-31529
> URL: https://issues.apache.org/jira/browse/FLINK-31529
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Weihua Hu
>Priority: Major
>
> Currently the YarnClusterDescriptor always wait yarn application status to be 
> RUNNING even if we use the detach mode. 
> In batch mode, the queue resources is insufficient in most case. So the job 
> manager may take a long time to wait resources. And client also keep waiting 
> too. If flink client is killed(some other reason), the cluster will be 
> shutdown too.
> We need an option to let Flink client exit early. Use the detach option or 
> introduce a new option are both OK.
> Looking forward other suggestions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running

2023-03-20 Thread Feng Jin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702619#comment-17702619
 ] 

Feng Jin commented on FLINK-31529:
--

+1 for this. 

> Let yarn client exit early before JobManager running
> 
>
> Key: FLINK-31529
> URL: https://issues.apache.org/jira/browse/FLINK-31529
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Weihua Hu
>Priority: Major
>
> Currently the YarnClusterDescriptor always wait yarn application status to be 
> RUNNING even if we use the detach mode. 
> In batch mode, the queue resources is insufficient in most case. So the job 
> manager may take a long time to wait resources. And client also keep waiting 
> too. If flink client is killed(some other reason), the cluster will be 
> shutdown too.
> We need an option to let Flink client exit early. Use the detach option or 
> introduce a new option are both OK.
> Looking forward other suggestions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running

2023-03-20 Thread zlzhang0122 (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702617#comment-17702617
 ] 

zlzhang0122 commented on FLINK-31529:
-

IMHO I agree with this, and we have implement an option to deal with this.

> Let yarn client exit early before JobManager running
> 
>
> Key: FLINK-31529
> URL: https://issues.apache.org/jira/browse/FLINK-31529
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Weihua Hu
>Priority: Major
>
> Currently the YarnClusterDescriptor always wait yarn application status to be 
> RUNNING even if we use the detach mode. 
> In batch mode, the queue resources is insufficient in most case. So the job 
> manager may take a long time to wait resources. And client also keep waiting 
> too. If flink client is killed(some other reason), the cluster will be 
> shutdown too.
> We need an option to let Flink client exit early. Use the detach option or 
> introduce a new option are both OK.
> Looking forward other suggestions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)