[ 
https://issues.apache.org/jira/browse/FLINK-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643990#comment-16643990
 ] 

ASF GitHub Bot commented on FLINK-5542:
---------------------------------------

asfgit closed pull request #6775: [FLINK-5542] use YarnCluster vcores setting 
to do MaxVCore validation
URL: https://github.com/apache/flink/pull/6775
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
 
b/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
index c3ad9f7f42c..c161e227577 100644
--- 
a/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
+++ 
b/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
@@ -282,18 +282,27 @@ private void isReadyForDeployment(ClusterSpecification 
clusterSpecification) thr
                }
 
                // Check if we don't exceed YARN's maximum virtual cores.
-               // The number of cores can be configured in the config.
-               // If not configured, it is set to the number of task slots
-               int numYarnVcores = 
yarnConfiguration.getInt(YarnConfiguration.NM_VCORES, 
YarnConfiguration.DEFAULT_NM_VCORES);
+               // Fetch numYarnMaxVcores from all the RUNNING nodes via 
yarnClient
+               final int numYarnMaxVcores;
+               try {
+                       numYarnMaxVcores = 
yarnClient.getNodeReports(NodeState.RUNNING)
+                               .stream()
+                               .mapToInt(report -> 
report.getCapability().getVirtualCores())
+                               .max()
+                               .orElse(0);
+               } catch (Exception e) {
+                       throw new YarnDeploymentException("Couldn't get cluster 
description, please check on the YarnConfiguration", e);
+               }
+
                int configuredVcores = 
flinkConfiguration.getInteger(YarnConfigOptions.VCORES, 
clusterSpecification.getSlotsPerTaskManager());
                // don't configure more than the maximum configured number of 
vcores
-               if (configuredVcores > numYarnVcores) {
+               if (configuredVcores > numYarnMaxVcores) {
                        throw new IllegalConfigurationException(
-                               String.format("The number of virtual cores per 
node were configured with %d" +
-                                               " but Yarn only has %d virtual 
cores available. Please note that the number" +
-                                               " of virtual cores is set to 
the number of task slots by default unless configured" +
-                                               " in the Flink config with 
'%s.'",
-                                       configuredVcores, numYarnVcores, 
YarnConfigOptions.VCORES.key()));
+                               String.format("The number of requested virtual 
cores per node %d" +
+                                               " exceeds the maximum number of 
virtual cores %d available in the Yarn Cluster." +
+                                               " Please note that the number 
of virtual cores is set to the number of task slots by default" +
+                                               " unless configured in the 
Flink config with '%s.'",
+                                       configuredVcores, numYarnMaxVcores, 
YarnConfigOptions.VCORES.key()));
                }
 
                // check if required Hadoop environment variables are set. If 
not, warn user


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> YARN client incorrectly uses local YARN config to check vcore capacity
> ----------------------------------------------------------------------
>
>                 Key: FLINK-5542
>                 URL: https://issues.apache.org/jira/browse/FLINK-5542
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.1.4, 1.5.3, 1.6.0, 1.7.0
>            Reporter: Shannon Carey
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> See 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html
> When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 
> 1.1.4 is comparing the user's selected number of vcores to the vcores 
> configured in the local node's YARN config (from YarnConfiguration eg. 
> yarn-site.xml and yarn-default.xml). It incorrectly prevents Flink from 
> launching even if there is sufficient vcore capacity on the cluster.
> That is not correct, because the application will not necessarily run on the 
> local node. For example, if running the yarn-session.sh client from the AWS 
> EMR master node, the vcore count there may be different from the vcore count 
> on the core nodes where Flink will actually run.
> A reasonable way to fix this would probably be to reuse the logic from 
> "yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get 
> vcore information from the real worker nodes.  Alternatively, perhaps we 
> could remove the check entirely and rely on YARN's Scheduler to determine 
> whether sufficient resources exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to