No worries, keep me posted. I think we did a good proof of concept, we're trying to make it solid now so if you find any issues let us know.
Darin On Jun 5, 2016 2:57 PM, "Stephen Gran" <stephen.g...@piksel.com> wrote: > Hi, > > Brilliant! Working now. > > Thank you very much, > > On 05/06/16 18:09, Darin Johnson wrote: > > Stephen, > > > > I was able to recreate the problem (specific due to 2.7.2, they changed > the > > defaults on the following two properties to true). Setting them to false > > allowed me to again run map reduce jobs. I'll try to update the > > documentation later today. > > > > <property> > > > > <name>yarn.nodemanager.pmem-check-enabled</name> > > > > <value>false</value> > > > > </property> > > > > <property> > > > > <name>yarn.nodemanager.vmem-check-enabled</name> > > > > <value>false</value> > > > > </property> > > > > Darin > > > > On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com> > > wrote: > > > >> Hi, > >> > >> I think those are the properties I added when I started getting this > >> error. Removing them doesn't seem to make any difference, sadly. > >> > >> This is hadoop 2.7.2 > >> > >> Cheers, > >> > >> On 05/06/16 14:45, Darin Johnson wrote: > >>> Hey Stephen, > >>> > >>> I think you're pretty close. > >>> > >>> Looking at the config I'd suggest removing these properties: > >>> > >>> <property> > >>> <name>yarn.nodemanager.resource.memory-mb</name> > >>> <value>4096</value> > >>> </property> > >>> <property> > >>> <name>yarn.scheduler.maximum-allocation-vcores</name> > >>> <value>12</value> > >>> </property> > >>> <property> > >>> <name>yarn.scheduler.maximum-allocation-mb</name> > >>> <value>8192</value> > >>> </property> > >>> <property> > >>> <name>yarn.nodemanager.vmem-check-enabled</name> > >>> <value>false</value> > >>> <description>Whether virtual memory limits will be enforced for > >>> containers</description> > >>> </property> > >>> <property> > >>> <name>yarn.nodemanager.vmem-pmem-ratio</name> > >>> <value>4</value> > >>> <description>Ratio between virtual memory to physical memory when > >>> setting memory limits for containers</description> > >>> </property> > >>> > >>> I'll try them out on my test cluster later today/tonight and see if I > can > >>> recreate the problem. What version of hadoop are you running? I'll > make > >>> sure I'm consistent with that as well. > >>> > >>> Thanks, > >>> > >>> Darin > >>> On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com> > wrote: > >>> > >>>> Hi, > >>>> > >>>> Attached. Thanks very much for looking. > >>>> > >>>> Cheers, > >>>> > >>>> On 05/06/16 12:51, Darin Johnson wrote: > >>>>> Hey Steven can you please send your yarn-site.xml, I'm guessing > you're > >> on > >>>>> the right track. > >>>>> > >>>>> Darin > >>>>> Hi, > >>>>> > >>>>> OK. That helps, thank you. I think I just misunderstood the docs > (or > >>>>> they never said explicitly that you did need at least some static > >>>>> resource), and I scaled down the initial nm.medium that got > started. I > >>>>> get a bit further now, and jobs start but are killed with: > >>>>> > >>>>> Diagnostics: Container > >>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is > >> running > >>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical > >>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container > >>>>> > >>>>> When I've seen this in the past with yarn but without myriad, it was > >>>>> usually about ratios of vmem to mem and things like that - I've tried > >>>>> some of those knobs, but I didn't expect much result and didn't get > >> any. > >>>>> > >>>>> What strikes me about the error message is that the vmem and mem > >>>>> allocations are for 0. > >>>>> > >>>>> I'm sorry for asking what are probably naive questions here, I > couldn't > >>>>> find a different forum. If there is one, please point me there so I > >>>>> don't disrupt the dev flow here. > >>>>> > >>>>> I can see this in the logs: > >>>>> > >>>>> > >>>>> 2016-06-05 07:39:25,687 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > >>>>> container_1465112239753_0001_03_000001 Container Transitioned from > NEW > >>>>> to ALLOCATED > >>>>> 2016-06-05 07:39:25,688 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: > USER=root > >>>>> OPERATION=AM Allocated Container TARGET=SchedulerApp > >>>>> RESULT=SUCCESS APPID=application_1465112239753_0001 > >>>>> CONTAINERID=container_1465112239753_0001_03_000001 > >>>>> 2016-06-05 07:39:25,688 INFO > >>>>> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > >>>>> Assigned container container_1465112239753_0001_03_000001 of capacity > >>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1 > >>>>> containers, <memory:0, vCores:0> used and <memory:4096, vCores:1> > >>>>> available after allocation > >>>>> 2016-06-05 07:39:25,689 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: > >>>>> Sending NMToken for nodeId : slave2.testing.local:26688 for > container : > >>>>> container_1465112239753_0001_03_000001 > >>>>> 2016-06-05 07:39:25,696 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > >>>>> container_1465112239753_0001_03_000001 Container Transitioned from > >>>>> ALLOCATED to ACQUIRED > >>>>> 2016-06-05 07:39:25,696 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: > >>>>> Clear node set for appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,696 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> Storing attempt: AppId: application_1465112239753_0001 AttemptId: > >>>>> appattempt_1465112239753_0001_000003 MasterContainer: Container: > >>>>> [ContainerId: container_1465112239753_0001_03_000001, NodeId: > >>>>> slave2.testing.local:26688, NodeHttpAddress: > >> slave2.testing.local:24387, > >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind: > >>>>> ContainerToken, service: 10.0.5.5:26688 }, ] > >>>>> 2016-06-05 07:39:25,697 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> appattempt_1465112239753_0001_000003 State change from SCHEDULED to > >>>>> ALLOCATED_SAVING > >>>>> 2016-06-05 07:39:25,698 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> appattempt_1465112239753_0001_000003 State change from > ALLOCATED_SAVING > >>>>> to ALLOCATED > >>>>> 2016-06-05 07:39:25,699 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: > >>>>> Launching masterappattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,705 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: > >>>>> Setting up container Container: [ContainerId: > >>>>> container_1465112239753_0001_03_000001, NodeId: > >>>>> slave2.testing.local:26688, NodeHttpAddress: > >> slave2.testing.local:24387, > >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind: > >>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM > >>>>> appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,705 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: > >>>>> Command to launch container container_1465112239753_0001_03_000001 : > >>>>> $JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp > >>>>> -Dlog4j.configuration=container-log4j.properties > >>>>> -Dyarn.app.container.log.dir=<LOG_DIR> > >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA > >>>>> -Dhadoop.root.logfile=syslog -Xmx1024m > >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout > >>>>> 2><LOG_DIR>/stderr > >>>>> 2016-06-05 07:39:25,706 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: > >>>>> Create AMRMToken for ApplicationAttempt: > >>>>> appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,707 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: > >>>>> Creating password for appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,727 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: > >>>>> Done launching container Container: [ContainerId: > >>>>> container_1465112239753_0001_03_000001, NodeId: > >>>>> slave2.testing.local:26688, NodeHttpAddress: > >> slave2.testing.local:24387, > >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind: > >>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM > >>>>> appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:25,728 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> appattempt_1465112239753_0001_000003 State change from ALLOCATED to > >>>> LAUNCHED > >>>>> 2016-06-05 07:39:25,736 WARN > >>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler: > >>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status: > >>>>> TASK_RUNNING > >>>>> 2016-06-05 07:39:26,510 INFO > org.apache.hadoop.yarn.util.RackResolver: > >>>>> Resolved slave1.testing.local to /default-rack > >>>>> 2016-06-05 07:39:26,517 WARN > >>>>> org.apache.myriad.scheduler.fgs.NMHeartBeatHandler: > FineGrainedScaling > >>>>> feature got invoked for a NM with non-zero capacity. Host: > >>>>> slave1.testing.local, Mem: 4096, CPU: 0. Setting the NM's capacity to > >>>>> (0G,0CPU) > >>>>> 2016-06-05 07:39:26,517 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > >>>>> slave1.testing.local:29121 Node Transitioned from NEW to RUNNING > >>>>> 2016-06-05 07:39:26,518 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > >>>>> Added node slave1.testing.local:29121 cluster capacity: <memory:4096, > >>>>> vCores:1> > >>>>> 2016-06-05 07:39:26,519 INFO > >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: > >>>>> afterSchedulerEventHandled: NM registration from node > >>>> slave1.testing.local > >>>>> 2016-06-05 07:39:26,528 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > >>>>> received container statuses on node manager register :[container_id { > >>>>> app_attempt_id { application_id { id: 1 cluster_timestamp: > >> 1465112239753 > >>>>> } attemptId: 2 } id: 1 } container_state: C_RUNNING resource { > memory: > >> 0 > >>>>> virtual_cores: 0 } priority { priority: 0 } diagnostics: "" > >>>>> container_exit_status: -1000 creation_time: 1465112356478] > >>>>> 2016-06-05 07:39:26,530 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > >>>>> NodeManager from node slave1.testing.local(cmPort: 29121 httpPort: > >>>>> 20456) registered with capability: <memory:0, vCores:0>, assigned > >> nodeId > >>>>> slave1.testing.local:29121 > >>>>> 2016-06-05 07:39:26,611 INFO > >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting > >>>>> capacity for node slave1.testing.local to <memory:4637, vCores:6> > >>>>> 2016-06-05 07:39:26,611 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > >>>>> Update resource on node: slave1.testing.local from: <memory:0, > >>>>> vCores:0>, to: <memory:4637, vCores:6> > >>>>> 2016-06-05 07:39:26,615 INFO > >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting > >>>>> capacity for node slave1.testing.local to <memory:0, vCores:0> > >>>>> 2016-06-05 07:39:26,616 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: > >>>>> Update resource on node: slave1.testing.local from: <memory:4637, > >>>>> vCores:6>, to: <memory:0, vCores:0> > >>>>> 2016-06-05 07:39:26,691 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > >>>>> container_1465112239753_0001_03_000001 Container Transitioned from > >>>>> ACQUIRED to RUNNING > >>>>> 2016-06-05 07:39:26,835 WARN > >>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler: > >>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status: > >>>>> TASK_FINISHED > >>>>> 2016-06-05 07:39:27,603 INFO > >>>>> > org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler: > >>>>> Received offers 1 > >>>>> 2016-06-05 07:39:27,748 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > >>>>> container_1465112239753_0001_03_000001 Container Transitioned from > >>>>> RUNNING to COMPLETED > >>>>> 2016-06-05 07:39:27,748 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > >>>>> Completed container: container_1465112239753_0001_03_000001 in state: > >>>>> COMPLETED event:FINISHED > >>>>> 2016-06-05 07:39:27,748 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: > USER=root > >>>>> OPERATION=AM Released Container TARGET=SchedulerApp > >>>>> RESULT=SUCCESS APPID=application_1465112239753_0001 > >>>>> CONTAINERID=container_1465112239753_0001_03_000001 > >>>>> 2016-06-05 07:39:27,748 INFO > >>>>> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > >>>>> Released container container_1465112239753_0001_03_000001 of capacity > >>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which > >> currently > >>>>> has 0 containers, <memory:0, vCores:0> used and <memory:4096, > vCores:1> > >>>>> available, release resources=true > >>>>> 2016-06-05 07:39:27,748 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > >>>>> Application attempt appattempt_1465112239753_0001_000003 released > >>>>> container container_1465112239753_0001_03_000001 on node: host: > >>>>> slave2.testing.local:26688 #containers=0 available=<memory:4096, > >>>>> vCores:1> used=<memory:0, vCores:0> with event: FINISHED > >>>>> 2016-06-05 07:39:27,749 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> Updating application attempt appattempt_1465112239753_0001_000003 > with > >>>>> final state: FAILED, and exit status: -103 > >>>>> 2016-06-05 07:39:27,750 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> appattempt_1465112239753_0001_000003 State change from LAUNCHED to > >>>>> FINAL_SAVING > >>>>> 2016-06-05 07:39:27,751 INFO > >>>>> > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > >>>>> Unregistering app attempt : appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:27,751 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: > >>>>> Application finished, removing password for > >>>>> appattempt_1465112239753_0001_000003 > >>>>> 2016-06-05 07:39:27,751 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > >>>>> appattempt_1465112239753_0001_000003 State change from FINAL_SAVING > to > >>>>> FAILED > >>>>> 2016-06-05 07:39:27,751 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The > >>>>> number of failed attempts is 2. The max attempts is 2 > >>>>> 2016-06-05 07:39:27,753 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: > Updating > >>>>> application application_1465112239753_0001 with final state: FAILED > >>>>> 2016-06-05 07:39:27,756 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: > >>>>> application_1465112239753_0001 State change from ACCEPTED to > >> FINAL_SAVING > >>>>> 2016-06-05 07:39:27,757 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > >>>>> Application appattempt_1465112239753_0001_000003 is done. > >>>> finalState=FAILED > >>>>> 2016-06-05 07:39:27,757 INFO > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: > >>>>> Application application_1465112239753_0001 requests cleared > >>>>> 2016-06-05 07:39:27,758 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > >>>>> Updating info for app: application_1465112239753_0001 > >>>>> 2016-06-05 07:39:27,758 INFO > >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: > >>>>> Application application_1465112239753_0001 failed 2 times due to AM > >>>>> Container for appattempt_1465112239753_0001_000003 exited with > >>>>> exitCode: -103 > >>>>> For more detailed output, check application tracking > >>>>> page: > >>>>> > >>>> > >> > http://master.testing.local:8088/cluster/app/application_1465112239753_0001Then > >>>>> , > >>>>> click on links to logs of each attempt. > >>>>> Diagnostics: Container > >>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is > >> running > >>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical > >>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container. > >>>>> Dump of the process-tree for container_1465112239753_0001_03_000001 : > >>>>> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > >>>>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) > FULL_CMD_LINE > >>>>> |- 3873 3865 3865 3865 (java) 80 26 2770927616 12614 > >>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java > >>>>> > >>>> > >> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp > >>>>> -Dlog4j.configuration=container-log4j.properties > >>>>> > >>>> > >> > -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001 > >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA > >>>>> -Dhadoop.root.logfile=syslog -Xmx1024m > >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster > >>>>> |- 3865 3863 3865 3865 (bash) 0 1 11427840 354 /bin/bash > -c > >>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java > >>>>> > >>>> > >> > -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp > >>>>> -Dlog4j.configuration=container-log4j.properties > >>>>> > >>>> > >> > -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001 > >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA > >>>>> -Dhadoop.root.logfile=syslog -Xmx1024m > >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster > >>>>> > >>>> > >> > 1>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stdout > >>>>> > >>>> > >> > 2>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stderr > >>>>> > >>>>> > >>>>> Container killed on request. Exit code is 143 > >>>>> Container exited with a non-zero exit code 143 > >>>>> Failing this attempt. Failing the application. > >>>>> > >>>>> > >>>>> > >>>>> On 03/06/16 15:52, yuliya Feldman wrote: > >>>>>> I believe you need at least one NM that is not subject to fine grain > >>>>> scaling. > >>>>>> So far if total resources on the cluster is less then a single > >> container > >>>>> needs for AM you won't be able to submit any app.As exception below > >> tells > >>>>> you. > >>>>>> (Invalid resource request, requested memory < 0, or requested memory > >>>>> max > >>>>> configured, requestedMemory=1536, maxMemory=0 > >>>>>> at) > >>>>>> I believe by default when starting Myriad cluster one NM with non 0 > >>>>> capacity should start by default. > >>>>>> In addition see in RM log whether offers with resources are coming > to > >> RM > >>>>> - this info should be in the log. > >>>>>> > >>>>>> From: Stephen Gran <stephen.g...@piksel.com> > >>>>>> To: "dev@myriad.incubator.apache.org" < > >>>> dev@myriad.incubator.apache.org> > >>>>>> Sent: Friday, June 3, 2016 1:29 AM > >>>>>> Subject: problem getting fine grained scaling workig > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I'm trying to get fine grained scaling going on a test mesos > >> cluster. I > >>>>>> have a single master and 2 agents. I am running 2 node managers > with > >>>>>> the zero profile, one per agent. I can see both of them in the RM > UI > >>>>>> reporting correctly as having 0 resources. > >>>>>> > >>>>>> I'm getting stack traces when I try to launch a sample application, > >>>>>> though. I feel like I'm just missing something obvious somewhere - > >> can > >>>>>> anyone shed any light? > >>>>>> > >>>>>> This is on a build of yesterday's git head. > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> root@master:/srv/apps/hadoop# bin/yarn jar > >>>>>> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen > >> 10000 > >>>>>> /outDir > >>>>>> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager > >> at > >>>>>> master.testing.local/10.0.5.3:8032 > >>>>>> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 10000 using 2 > >>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2 > >>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for > >>>>>> job: job_1464902078156_0001 > >>>>>> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the > staging > >>>>>> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001 > >>>>>> java.io.IOException: > >>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: > >>>>>> Invalid resource request, requested memory < 0, or requested memory > > > >>>>>> max configured, requestedMemory=1536, maxMemory=0 > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > >>>>>> at java.security.AccessController.doPrivileged(Native > >> Method) > >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > >>>>>> at > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > >>>>>> > >>>>>> at > >>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) > >>>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > >>>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) > >>>>>> at java.security.AccessController.doPrivileged(Native > >> Method) > >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > >>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > >>>>>> at > >>>>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) > >>>>>> at > >>>>> org.apache.hadoop.examples.terasort.TeraGen.run(TeraGen.java:301) > >>>>>> at > >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >>>>>> at > >>>>> org.apache.hadoop.examples.terasort.TeraGen.main(TeraGen.java:305) > >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >> Method) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>>>> at java.lang.reflect.Method.invoke(Method.java:497) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > >>>>>> at > >>>>> org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > >>>>>> at > >>>>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >> Method) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>>>> at java.lang.reflect.Method.invoke(Method.java:497) > >>>>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > >>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > >>>>>> Caused by: > >>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: > >>>>>> Invalid resource request, requested memory < 0, or requested memory > > > >>>>>> max configured, requestedMemory=1536, maxMemory=0 > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > >>>>>> at java.security.AccessController.doPrivileged(Native > >> Method) > >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > >>>>>> at > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > >>>>>> > >>>>>> at > >>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > >>>>> Method) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > >>>>>> at > >>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:422) > >>>>>> at > >>>>> > >> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:239) > >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >> Method) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>>>>> at java.lang.reflect.Method.invoke(Method.java:497) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>>>>> at com.sun.proxy.$Proxy13.submitApplication(Unknown > Source) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:253) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290) > >>>>>> at > >>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290) > >>>>>> ... 24 more > >>>>>> Caused by: > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): > >>>>>> Invalid resource request, requested memory < 0, or requested memory > > > >>>>>> max configured, requestedMemory=1536, maxMemory=0 > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > >>>>>> at > >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > >>>>>> at java.security.AccessController.doPrivileged(Native > >> Method) > >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > >>>>>> at > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > >>>>>> > >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1475) > >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1412) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > >>>>>> at com.sun.proxy.$Proxy12.submitApplication(Unknown > Source) > >>>>>> at > >>>>>> > >>>>> > >>>> > >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:236) > >>>>>> ... 34 more > >>>>>> > >>>>>> > >>>>>> Cheers, > >>>>>> -- > >>>>>> Stephen Gran > >>>>>> Senior Technical Architect > >>>>>> > >>>>>> picture the possibilities | piksel.com > >>>>>> This message is private and confidential. If you have received this > >>>>> message in error, please notify the sender or serviced...@piksel.com > >> and > >>>>> remove it from your system. > >>>>>> > >>>>>> Piksel Inc is a company registered in the United States New York > City, > >>>>> 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986 > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> -- > >>>>> Stephen Gran > >>>>> Senior Technical Architect > >>>>> > >>>>> picture the possibilities | piksel.com > >>>>> > >>>> > >>>> -- > >>>> Stephen Gran > >>>> Senior Technical Architect > >>>> > >>>> picture the possibilities | piksel.com > >>>> > >>> > >> > >> -- > >> Stephen Gran > >> Senior Technical Architect > >> > >> picture the possibilities | piksel.com > >> > > > > -- > Stephen Gran > Senior Technical Architect > > picture the possibilities | piksel.com > >