[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running
[ https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709367#comment-17709367 ] Weihua Hu commented on FLINK-31529: --- [~xtsong] Could you take a look at this > Let yarn client exit early before JobManager running > > > Key: FLINK-31529 > URL: https://issues.apache.org/jira/browse/FLINK-31529 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Weihua Hu >Priority: Major > > Currently the YarnClusterDescriptor always wait yarn application status to be > RUNNING even if we use the detach mode. > In batch mode, the queue resources is insufficient in most case. So the job > manager may take a long time to wait resources. And client also keep waiting > too. If flink client is killed(some other reason), the cluster will be > shutdown too. > We need an option to let Flink client exit early. Use the detach option or > introduce a new option are both OK. > Looking forward other suggestions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running
[ https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702645#comment-17702645 ] Weihua Hu commented on FLINK-31529: --- Thanks all for reply. If introducing a new option is acceptable, I would like to take this issue. In batch mode, the queue resources is insufficient in most case. yes. In detail, we have two kinds of YARN clusters: Streaming and Batch. The Streaming cluster provide guaranteed resources to make all jobs long-running. The Batch cluster does not guarantee that jobs could get resources Immediately, these application will queued in YARN scheduler. This strategy maximizes resource utilization. In addition, we do have a platform to schedule flink batch jobs. These jobs will run at hourly or daily levels. > Let yarn client exit early before JobManager running > > > Key: FLINK-31529 > URL: https://issues.apache.org/jira/browse/FLINK-31529 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Weihua Hu >Priority: Major > > Currently the YarnClusterDescriptor always wait yarn application status to be > RUNNING even if we use the detach mode. > In batch mode, the queue resources is insufficient in most case. So the job > manager may take a long time to wait resources. And client also keep waiting > too. If flink client is killed(some other reason), the cluster will be > shutdown too. > We need an option to let Flink client exit early. Use the detach option or > introduce a new option are both OK. > Looking forward other suggestions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running
[ https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702634#comment-17702634 ] Biao Geng commented on FLINK-31529: --- Hi there, I want to share some thoughts/questions about this jira: 1. According to the [doc|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/], the --detached option is used to notify if the client should wait the job to finish. I have seem some users' platform rely on this option to get the returned YARN app info to manage their flink jobs(e.g. whether the job is submitted successfully). Maybe introducing a new option is better than changing the behavior of the --detached option. 2. The description says "In batch mode, the queue resources is insufficient in most case." IIUC, the lack of resource should not be a normal case. One possible use case I can come up with is that to reduce costs, people may run flink batch jobs in night and utilize workflow frameworks like airflow to retry the submission. Is that the case? > Let yarn client exit early before JobManager running > > > Key: FLINK-31529 > URL: https://issues.apache.org/jira/browse/FLINK-31529 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Weihua Hu >Priority: Major > > Currently the YarnClusterDescriptor always wait yarn application status to be > RUNNING even if we use the detach mode. > In batch mode, the queue resources is insufficient in most case. So the job > manager may take a long time to wait resources. And client also keep waiting > too. If flink client is killed(some other reason), the cluster will be > shutdown too. > We need an option to let Flink client exit early. Use the detach option or > introduce a new option are both OK. > Looking forward other suggestions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running
[ https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702619#comment-17702619 ] Feng Jin commented on FLINK-31529: -- +1 for this. > Let yarn client exit early before JobManager running > > > Key: FLINK-31529 > URL: https://issues.apache.org/jira/browse/FLINK-31529 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Weihua Hu >Priority: Major > > Currently the YarnClusterDescriptor always wait yarn application status to be > RUNNING even if we use the detach mode. > In batch mode, the queue resources is insufficient in most case. So the job > manager may take a long time to wait resources. And client also keep waiting > too. If flink client is killed(some other reason), the cluster will be > shutdown too. > We need an option to let Flink client exit early. Use the detach option or > introduce a new option are both OK. > Looking forward other suggestions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31529) Let yarn client exit early before JobManager running
[ https://issues.apache.org/jira/browse/FLINK-31529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702617#comment-17702617 ] zlzhang0122 commented on FLINK-31529: - IMHO I agree with this, and we have implement an option to deal with this. > Let yarn client exit early before JobManager running > > > Key: FLINK-31529 > URL: https://issues.apache.org/jira/browse/FLINK-31529 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Reporter: Weihua Hu >Priority: Major > > Currently the YarnClusterDescriptor always wait yarn application status to be > RUNNING even if we use the detach mode. > In batch mode, the queue resources is insufficient in most case. So the job > manager may take a long time to wait resources. And client also keep waiting > too. If flink client is killed(some other reason), the cluster will be > shutdown too. > We need an option to let Flink client exit early. Use the detach option or > introduce a new option are both OK. > Looking forward other suggestions -- This message was sent by Atlassian Jira (v8.20.10#820010)