tomicooler commented on code in PR #6751:
URL: https://github.com/apache/hadoop/pull/6751#discussion_r1579625169


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2CpuResourceHandlerImpl.java:
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.classification.VisibleForTesting;
+
+/**
+ * An implementation for using CGroups V2 to restrict CPU usage on Linux. The
+ * implementation supports 3 different controls - restrict usage of all YARN
+ * containers, restrict relative usage of individual YARN containers and
+ * restrict usage of individual YARN containers. Admins can set the overall CPU
+ * to be used by all YARN containers - this is implemented by setting
+ * cpu.max to the value desired. If strict resource usage mode is not enabled,
+ * cpu.weight is set for individual containers - this prevents containers from
+ * exceeding the overall limit for YARN containers but individual containers
+ * can use as much of the CPU as available(under the YARN limit). If strict
+ * resource usage is enabled, then container can only use the percentage of
+ * CPU allocated to them and this is again implemented using cpu.max.
+ */
+@InterfaceStability.Unstable
+@InterfaceAudience.Private
+public class CGroupsV2CpuResourceHandlerImpl extends 
AbstractCGroupsCpuResourceHandler {
+  private static final CGroupsHandler.CGroupController CPU =
+      CGroupsHandler.CGroupController.CPU;
+
+  @VisibleForTesting
+  static final int CPU_DEFAULT_WEIGHT = 100; // cgroup v2 default
+  static final int CPU_DEFAULT_WEIGHT_OPPORTUNISTIC = 1;
+  static final int CPU_MAX_WEIGHT = 10000;
+  static final String NO_LIMIT = "max";
+
+
+  CGroupsV2CpuResourceHandlerImpl(CGroupsHandler cGroupsHandler) {
+    super(cGroupsHandler);
+  }
+
+  @Override
+  protected void updateCgroupMaxCpuLimit(String cgroupId, String max, String 
period)
+      throws ResourceHandlerException {
+    String cpuMaxLimit = cGroupsHandler.getCGroupParam(CPU, cgroupId,

Review Comment:
   Maybe a small doc here about the file format: 
https://docs.kernel.org/admin-guide/cgroup-v2.html#cpu-interface-files
   
   ```
   A read-write two value file which exists on non-root cgroups. The default is 
“max 100000”.
   
   The maximum bandwidth limit. It’s in the following format:
   
   $MAX $PERIOD
   which indicates that the group may consume up to $MAX in each $PERIOD 
duration. “max” for $MAX indicates no limit. If only one number is written, 
$MAX is updated.
   ```
   
   
   rename: cpuMaxLimit  -> currentCpuMax



##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/AbstractCGroupsCpuResourceHandler.java:
##########
@@ -0,0 +1,219 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.yarn.api.records.ContainerId;
+import org.apache.hadoop.yarn.api.records.ExecutionType;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.apache.hadoop.yarn.conf.YarnConfiguration;
+import org.apache.hadoop.yarn.security.ContainerTokenIdentifier;
+import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container;
+import 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperation;
+import org.apache.hadoop.yarn.server.nodemanager.util.NodeManagerHardwareUtils;
+import org.apache.hadoop.yarn.util.ResourceCalculatorPlugin;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+
+@InterfaceStability.Unstable
+@InterfaceAudience.Private
+public abstract class AbstractCGroupsCpuResourceHandler implements 
CpuResourceHandler {
+
+  static final Logger LOG =
+       LoggerFactory.getLogger(AbstractCGroupsCpuResourceHandler.class);
+
+  protected CGroupsHandler cGroupsHandler;
+  private boolean strictResourceUsageMode = false;
+  private float yarnProcessors;
+  private int nodeVCores;
+  private static final CGroupsHandler.CGroupController CPU =
+      CGroupsHandler.CGroupController.CPU;
+
+  @VisibleForTesting
+  static final int MAX_QUOTA_US = 1000 * 1000;
+  @VisibleForTesting
+  static final int MIN_PERIOD_US = 1000;
+
+  AbstractCGroupsCpuResourceHandler(CGroupsHandler cGroupsHandler) {
+    this.cGroupsHandler = cGroupsHandler;
+  }
+
+  @Override
+  public List<PrivilegedOperation> bootstrap(Configuration conf)
+      throws ResourceHandlerException {
+    return bootstrap(
+        ResourceCalculatorPlugin.getResourceCalculatorPlugin(null, conf), 
conf);
+  }
+
+  @VisibleForTesting
+  List<PrivilegedOperation> bootstrap(
+      ResourceCalculatorPlugin plugin, Configuration conf)
+      throws ResourceHandlerException {
+    this.strictResourceUsageMode = conf.getBoolean(
+        YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE,
+        
YarnConfiguration.DEFAULT_NM_LINUX_CONTAINER_CGROUPS_STRICT_RESOURCE_USAGE);
+    this.cGroupsHandler.initializeCGroupController(CPU);
+    nodeVCores = NodeManagerHardwareUtils.getVCores(plugin, conf);
+
+    // cap overall usage to the number of cores allocated to YARN
+    yarnProcessors = NodeManagerHardwareUtils.getContainersCPUs(plugin, conf);
+    int systemProcessors = NodeManagerHardwareUtils.getNodeCPUs(plugin, conf);
+    boolean existingCpuLimits;
+    existingCpuLimits = cpuLimitExists(
+        cGroupsHandler.getPathForCGroup(CPU, ""));
+
+    if (systemProcessors != (int) yarnProcessors) {
+      LOG.info("YARN containers restricted to " + yarnProcessors + " cores");
+      int[] limits = getOverallLimits(yarnProcessors);
+      updateCgroupMaxCpuLimit("", String.valueOf(limits[1]), 
String.valueOf(limits[0]));
+    } else if (existingCpuLimits) {
+      LOG.info("Removing CPU constraints for YARN containers.");
+      updateCgroupMaxCpuLimit("", String.valueOf(-1), null);
+    }
+    return null;
+  }
+
+  protected abstract void updateCgroupMaxCpuLimit(String cgroupId, String 
quota, String period)
+      throws ResourceHandlerException;
+  protected abstract boolean cpuLimitExists(String path) throws 
ResourceHandlerException;
+
+
+  @VisibleForTesting
+  @InterfaceAudience.Private
+  public static int[] getOverallLimits(float yarnProcessors) {

Review Comment:
   NIT. For readability we could introduce a wrapper over this that return an 
object with explicit `period` and `quota` fields. The newly introduced 
`updateCgroupMaxCpuLimit` and alike methods could use this object as parameter.
   
   Since `getOverallLimits` is used in the depcreated LCE code as well we, must 
keep this too, so I'm fine by keeping it as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to