Hi Darin, container-executor.cfg on the master node: ================================================== yarn.nodemanager.linux-container-executor.group=yarn #configured value of yarn.nodemanager.linux-container-executor.group banned.users=#comma separated list of users who can not run applications min.user.id=1000#Prevent other super-users allowed.system.users=bjoernh,yarn##comma separated list of system users who CAN run applications ==================================================
On the slave nodes: ================================================== #configured value of yarn.nodemanager.linux-container-executor.group yarn.nodemanager.linux-container-executor.group=yarn #comma separated list of users who can not run applications banned.users=hfds,yarn,mapred,bin #Prevent other super-users min.user.id=99 #comma separated list of system users who CAN run applications allowed.system.users= ================================================== The difference comes from not having defined the installation of NM in Puppet on the master node. I already played with diff. values for the allowed.system.users, but had no success so far. Is it correct that the container-executor.cfg is only relevant on the NM nodes? Best regards and thanks for your efforts, Björn Am 16.03.2016 um 07:10 schrieb Darin Johnson: > what does your container-executor.cfg look like? Seems like > yarn.nodemanager.linux-container-executor.group isn't set, or possibly > bannerusers= hasn't been set (some distro's). > > On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson <dbjohnson1...@gmail.com> > wrote: > >> Bjorn, >> >> You're isolation configuration is correct, I was going from memory. I'll >> take a look at you're configs a little later on my test environment and see >> what I can come up with. >> >> Darin >> >> On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier < >> b.hageme...@fz-juelich.de> wrote: >> >>> Dear Darin, >>> >>> thanks for your response. >>> >>> The precise content of /etc/mesos-slave/isolation is: >>> >>> ================================================== >>> cgroups/cpu,cgroups/mem >>> ================================================== >>> >>> Which I took from some documentation, it may have been that of the >>> Puppet module I'm using [1]. Should the values be different? Your string >>> looks a bit different: "cpu/cgroups,memory/cgroups". >>> >>> Please find my yarn-site.xml and myriad-config-default.yml attached. I >>> don't think they contain any sensitive information. >>> >>> >>> Best regards, >>> Björn >>> >>> [1] https://github.com/deric/puppet-mesos >>> >>> Am 15.03.2016 um 16:46 schrieb Darin Johnson: >>>> Hey Bjorn, >>>> >>>> Can you copy paste the relevant part of the Myriad and yarn-site.xml? >>>> Also, can you ensure you are running the mesos-slave with >>>> --isolation="cpu/cgroups,memory/cgroups?. >>>> >>>> I'll try to recreate the problem and/or tell you what's missing in the >>>> config. >>>> >>>> Darin >>>> >>>> On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier < >>> b.hageme...@fz-juelich.de> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have trouble starting the NM on the slave nodes. Apparently, it does >>>>> not find it's configuration or sth. is wrong with the configuration. >>>>> >>>>> With cgroups enabled, the NM does not start, the logs contain, >>>>> indicating that there is sth. wrong in the configuratin. However, >>>>> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The >>>>> value used to be "${yarn.nodemanager.linux-container-executor.group}" >>> as >>>>> indicated by the installation documentation, however I'm uncertain >>>>> whether this recursion is the correct approach. >>>>> >>>>> >>>>> ================================================== >>>>> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting >>> NodeManager >>>>> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to >>>>> initialize container executor >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213) >>>>> at >>>>> >>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521) >>>>> Caused by: java.io.IOException: Linux container executor not configured >>>>> properly (error=24) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) >>>>> ... 3 more >>>>> Caused by: ExitCodeException exitCode=24: Can't get configured value >>> for >>>>> yarn.nodemanager.linux-container-executor.group. >>>>> >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:460) >>>>> at >>>>> >>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187) >>>>> ... 4 more >>>>> ================================================== >>>>> >>>>> >>>>> I have given it another try with cgroups disabled (in >>>>> myriad-config-default.yml), I seem to get a little further, but still >>>>> stuck at running Yarn jobs: >>>>> >>>>> ================================================== >>>>> 16/03/14 10:56:34 INFO container.Container: Container >>>>> container_1457949199710_0001_01_000001 transitioned from LOCALIZED to >>>>> RUNNING >>>>> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor: >>>>> launchContainer: [bash, >>>>> >>>>> >>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh] >>>>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code >>>>> from container container_1457949199710_0001_01_000001 is : 1 >>>>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception >>>>> from container-launch with container ID: >>>>> container_1457949199710_0001_01_000001 and exit code: 1 >>>>> ExitCodeException exitCode=1: >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:460) >>>>> at >>>>> >>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) >>>>> at >>>>> >>>>> >>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>> at >>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>> at >>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from >>>>> container-launch. >>>>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id: >>>>> container_1457949199710_0001_01_000001 >>>>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1 >>>>> ================================================== >>>>> >>>>> Unfortunately, directory >>>>> >>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/ >>>>> is empty, the log indicates that it is being deleted after the failed >>>>> attempt. >>>>> >>>>> Again, any hint would be useful. Also regarding the activation of >>> cgroups. >>>>> >>>>> >>>>> Best regards, >>>>> Björn >>>>> >>>>> -- >>>>> Dipl.-Inform. Björn Hagemeier >>>>> Federated Systems and Data >>>>> Juelich Supercomputing Centre >>>>> Institute for Advanced Simulation >>>>> >>>>> Phone: +49 2461 61 1584 >>>>> Fax : +49 2461 61 6656 >>>>> Email: b.hageme...@fz-juelich.de >>>>> Skype: bhagemeier >>>>> WWW : http://www.fz-juelich.de/jsc >>>>> >>>>> JSC is the coordinator of the >>>>> John von Neumann Institute for Computing >>>>> and member of the >>>>> Gauss Centre for Supercomputing >>>>> >>>>> >>>>> >>> ------------------------------------------------------------------------------------- >>>>> >>>>> >>> ------------------------------------------------------------------------------------- >>>>> Forschungszentrum Juelich GmbH >>>>> 52425 Juelich >>>>> Sitz der Gesellschaft: Juelich >>>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>>>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher >>>>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >>>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, >>>>> Prof. Dr. Sebastian M. Schmidt >>>>> >>>>> >>> ------------------------------------------------------------------------------------- >>>>> >>>>> >>> ------------------------------------------------------------------------------------- >>>>> >>>>> >>>> >>> >>> >>> -- >>> Dipl.-Inform. Björn Hagemeier >>> Federated Systems and Data >>> Juelich Supercomputing Centre >>> Institute for Advanced Simulation >>> >>> Phone: +49 2461 61 1584 >>> Fax : +49 2461 61 6656 >>> Email: b.hageme...@fz-juelich.de >>> Skype: bhagemeier >>> WWW : http://www.fz-juelich.de/jsc >>> >>> JSC is the coordinator of the >>> John von Neumann Institute for Computing >>> and member of the >>> Gauss Centre for Supercomputing >>> >>> >>> ------------------------------------------------------------------------------------- >>> >>> ------------------------------------------------------------------------------------- >>> Forschungszentrum Juelich GmbH >>> 52425 Juelich >>> Sitz der Gesellschaft: Juelich >>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher >>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, >>> Prof. Dr. Sebastian M. Schmidt >>> >>> ------------------------------------------------------------------------------------- >>> >>> ------------------------------------------------------------------------------------- >>> >>> >> > -- Dipl.-Inform. Björn Hagemeier Federated Systems and Data Juelich Supercomputing Centre Institute for Advanced Simulation Phone: +49 2461 61 1584 Fax : +49 2461 61 6656 Email: b.hageme...@fz-juelich.de Skype: bhagemeier WWW : http://www.fz-juelich.de/jsc JSC is the coordinator of the John von Neumann Institute for Computing and member of the Gauss Centre for Supercomputing ------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------
<<attachment: b_hagemeier.vcf>>
smime.p7s
Description: S/MIME Cryptographic Signature