When trying cgroups on myriad-0.2 RC on a single node mapr cluster, I am getting the following issue:
1. The below errors is when launching NodeManager with cgroups enabled: *stdout*: export TASK_DIR=afe954c5-79dc-4238-af84-14855090df34&& sudo chown mapr /sys/fs/cgroup/cpu/mesos/afe954c5-79dc-4238-af84-14855090df34 && export YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0; env YARN_NODEMANAGER_OPTS=-Dcluster.name.prefix=/cluster1 -Dnodemanager.resource.io-spindles=4.0 -Dyarn.nodemanager.linux-container-executor.cgroups.hierarchy=mesos/ afe954c5-79dc-4238-af84-14855090df34 -Dyarn.home=/opt/mapr/hadoop/hadoop-2.7.0 -Dnodemanager.resource.cpu-vcores=4 -Dnodemanager.resource.memory-mb=4096 -Dmyriad.yarn.nodemanager.address=0.0.0.0:31847 -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31132 -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31181 -Dmyriad.mapreduce.shuffle.port=31166 YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0 /opt/mapr/hadoop/hadoop-2.7.0/bin/yarn nodemanager *stderr*: 16/05/21 01:43:13 INFO service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493) at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152) at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) ... 3 more 16/05/21 01:43:13 WARN service.AbstractService: When stopping the service NodeManager : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:276) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) 16/05/21 01:43:13 FATAL nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493) at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152) at org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) ... 3 more 16/05/21 01:43:13 INFO nodemanager.NodeManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at qa101-139/10.10.101.139 ************************************************************/ Here is the yarn-site.xml configurations: <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>testrm.marathon.mesos</value> <description>host is the hostname of the resourcemanager</description> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> <description>RM Recovery Enabled</description> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.myriad.scheduler.yarn.MyriadFairScheduler</value> <description>One can configure other scehdulers as well from following list: org.apache.myriad.scheduler.yarn.MyriadCapacityScheduler, org.apache.myriad.scheduler.yarn.MyriadFifoScheduler</description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>${nodemanager.resource.cpu-vcores}</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>${nodemanager.resource.memory-mb}</value> </property> <property> <name>yarn.nodemanager.address</name> <value>${myriad.yarn.nodemanager.address}</value> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>${myriad.yarn.nodemanager.webapp.address}</value> </property> <property> <name>yarn.nodemanager.webapp.https.address</name> <value>${myriad.yarn.nodemanager.webapp.address}</value> </property> <property> <name>yarn.nodemanager.localizer.address</name> <value>${myriad.yarn.nodemanager.localizer.address}</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>${myriad.mapreduce.shuffle.port}</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,mapr_direct_shuffle,myriad_executor</value> </property> <property> <name>yarn.nodemanager.aux-services.myriad_executor.class</name> <value>org.apache.myriad.executor.MyriadExecutorAuxService</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value> org.apache.hadoop.yarn.server.resourcemanager.recovery.MyriadFileSystemRMStateStore </value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>0</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>0</value> </property> <property> <name>yarn.scheduler.minimum-allocation-disks</name> <value>0</value> </property> <!-- Cgroups configuration --> <property> <description>who will execute(launch) the containers.</description> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor </value> </property> <property> <description>The class which should help the LCE handle resources. </description> <name>yarn.nodemanager.linux-container-executor.resources-handler.class </name> <value> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler </value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>mapr</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.path</name> <value>/opt/mapr/hadoop/hadoop-2.7.0/bin/container-executor</value> </property> Here is the *myriad-config-default.yml*: mesosMaster: zk://10.10.101.139:5181/mesos checkpoint: false frameworkFailoverTimeout: 43200000 frameworkName: MyriadAlpha frameworkRole: frameworkUser: mapr # running the resource manager. frameworkSuperUser: root # To be depricated, currently permissions need set by a superuser due to Mesos-1790. Must be # root or have passwordless sudo. Required if nodeManagerURI set, ignored otherwise. nativeLibrary: /usr/local/lib/libmesos.so zkServers: 10.10.101.139:5181 zkTimeout: 20000 restApiPort: 8192 profiles: zero: # NMs launched with this profile dynamically obtain cpu/mem from Mesos cpu: 0 mem: 0 spindles: 0 small: cpu: 2 mem: 2048 spindles: 1 medium: cpu: 4 mem: 4096 spindles: 2 large: cpu: 10 mem: 12288 spindles: 4 nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile. medium: 1 # <profile_name : instances> rebalancer: false haEnabled: true nodemanager: jvmMaxMemoryMB: 1024 cpus: 0.2 cgroups: true executor: jvmMaxMemoryMB: 256 path: file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar #The following should be used for a remotely distributed URI, hdfs assumed but other URI types valid. #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz #path: file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar yarnEnvironment: YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1 -Dnodemanager.resource.io-spindles=4.0 YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0 #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes necessary mesosAuthenticationPrincipal: mesosAuthenticationSecretFilename: services: jobhistory: jvmMaxMemoryMB: 64 cpus: 0.5 ports: myriad.mapreduce.jobhistory.admin.address: 10033 myriad.mapreduce.jobhistory.address: 10020 myriad.mapreduce.jobhistory.webapp.address: 19888 envSettings: -Dcluster.name.prefix=/cluster1 taskName: jobhistory serviceOptsName: HADOOP_JOB_HISTORYSERVER_OPTS command: $YARN_HOME/bin/mapred historyserver maxInstances: 1 Though, I have fixed some NMExecutorCLGenImpl.java for the NM commandline, but still the issue remains same. Let me know if there is any issue with the setup or missed any configuration details from myriad perspective. -Sarjeet