Re: problem getting fine grained scaling workig

Darin Johnson Mon, 06 Jun 2016 12:03:45 -0700

No worries, keep me posted.  I think we did a good proof of concept, we're
trying to make it solid now so if you find any issues let us know.


Darin
On Jun 5, 2016 2:57 PM, "Stephen Gran" <stephen.g...@piksel.com> wrote:

> Hi,
>
> Brilliant!  Working now.
>
> Thank you very much,
>
> On 05/06/16 18:09, Darin Johnson wrote:
> > Stephen,
> >
> > I was able to recreate the problem (specific due to 2.7.2, they changed
> the
> > defaults on the following two properties to true).  Setting them to false
> > allowed me to again run map reduce jobs.  I'll try to update the
> > documentation later today.
> >
> >    <property>
> >
> >      <name>yarn.nodemanager.pmem-check-enabled</name>
> >
> >      <value>false</value>
> >
> >    </property>
> >
> >    <property>
> >
> >      <name>yarn.nodemanager.vmem-check-enabled</name>
> >
> >      <value>false</value>
> >
> >    </property>
> >
> > Darin
> >
> > On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I think those are the properties I added when I started getting this
> >> error.  Removing them doesn't seem to make any difference, sadly.
> >>
> >> This is hadoop 2.7.2
> >>
> >> Cheers,
> >>
> >> On 05/06/16 14:45, Darin Johnson wrote:
> >>> Hey Stephen,
> >>>
> >>> I think you're pretty close.
> >>>
> >>> Looking at the config I'd suggest removing these properties:
> >>>
> >>>      <property>
> >>>           <name>yarn.nodemanager.resource.memory-mb</name>
> >>>           <value>4096</value>
> >>>       </property>
> >>>       <property>
> >>>           <name>yarn.scheduler.maximum-allocation-vcores</name>
> >>>           <value>12</value>
> >>>       </property>
> >>>       <property>
> >>>           <name>yarn.scheduler.maximum-allocation-mb</name>
> >>>           <value>8192</value>
> >>>       </property>
> >>>     <property>
> >>>      <name>yarn.nodemanager.vmem-check-enabled</name>
> >>>       <value>false</value>
> >>>       <description>Whether virtual memory limits will be enforced for
> >>> containers</description>
> >>>     </property>
> >>> <property>
> >>>      <name>yarn.nodemanager.vmem-pmem-ratio</name>
> >>>       <value>4</value>
> >>>       <description>Ratio between virtual memory to physical memory when
> >>> setting memory limits for containers</description>
> >>>     </property>
> >>>
> >>> I'll try them out on my test cluster later today/tonight and see if I
> can
> >>> recreate the problem.  What version of hadoop are you running?  I'll
> make
> >>> sure I'm consistent with that as well.
> >>>
> >>> Thanks,
> >>>
> >>> Darin
> >>> On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Attached.  Thanks very much for looking.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On 05/06/16 12:51, Darin Johnson wrote:
> >>>>> Hey Steven can you please send your yarn-site.xml, I'm guessing
> you're
> >> on
> >>>>> the right track.
> >>>>>
> >>>>> Darin
> >>>>> Hi,
> >>>>>
> >>>>> OK.  That helps, thank you.  I think I just misunderstood the docs
> (or
> >>>>> they never said explicitly that you did need at least some static
> >>>>> resource), and I scaled down the initial nm.medium that got
> started.  I
> >>>>> get a bit further now, and jobs start but are killed with:
> >>>>>
> >>>>> Diagnostics: Container
> >>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
> >> running
> >>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
> >>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container
> >>>>>
> >>>>> When I've seen this in the past with yarn but without myriad, it was
> >>>>> usually about ratios of vmem to mem and things like that - I've tried
> >>>>> some of those knobs, but I didn't expect much result and didn't get
> >> any.
> >>>>>
> >>>>> What strikes me about the error message is that the vmem and mem
> >>>>> allocations are for 0.
> >>>>>
> >>>>> I'm sorry for asking what are probably naive questions here, I
> couldn't
> >>>>> find a different forum.  If there is one, please point me there so I
> >>>>> don't disrupt the dev flow here.
> >>>>>
> >>>>> I can see this in the logs:
> >>>>>
> >>>>>
> >>>>> 2016-06-05 07:39:25,687 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> >>>>> container_1465112239753_0001_03_000001 Container Transitioned from
> NEW
> >>>>> to ALLOCATED
> >>>>> 2016-06-05 07:39:25,688 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
> USER=root
> >>>>>        OPERATION=AM Allocated Container        TARGET=SchedulerApp
> >>>>> RESULT=SUCCESS  APPID=application_1465112239753_0001
> >>>>> CONTAINERID=container_1465112239753_0001_03_000001
> >>>>> 2016-06-05 07:39:25,688 INFO
> >>>>>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> >>>>> Assigned container container_1465112239753_0001_03_000001 of capacity
> >>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1
> >>>>> containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
> >>>>> available after allocation
> >>>>> 2016-06-05 07:39:25,689 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> >>>>> Sending NMToken for nodeId : slave2.testing.local:26688 for
> container :
> >>>>> container_1465112239753_0001_03_000001
> >>>>> 2016-06-05 07:39:25,696 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> >>>>> container_1465112239753_0001_03_000001 Container Transitioned from
> >>>>> ALLOCATED to ACQUIRED
> >>>>> 2016-06-05 07:39:25,696 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> >>>>> Clear node set for appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,696 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> Storing attempt: AppId: application_1465112239753_0001 AttemptId:
> >>>>> appattempt_1465112239753_0001_000003 MasterContainer: Container:
> >>>>> [ContainerId: container_1465112239753_0001_03_000001, NodeId:
> >>>>> slave2.testing.local:26688, NodeHttpAddress:
> >> slave2.testing.local:24387,
> >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
> >>>>> ContainerToken, service: 10.0.5.5:26688 }, ]
> >>>>> 2016-06-05 07:39:25,697 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> appattempt_1465112239753_0001_000003 State change from SCHEDULED to
> >>>>> ALLOCATED_SAVING
> >>>>> 2016-06-05 07:39:25,698 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> appattempt_1465112239753_0001_000003 State change from
> ALLOCATED_SAVING
> >>>>> to ALLOCATED
> >>>>> 2016-06-05 07:39:25,699 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> >>>>> Launching masterappattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,705 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> >>>>> Setting up container Container: [ContainerId:
> >>>>> container_1465112239753_0001_03_000001, NodeId:
> >>>>> slave2.testing.local:26688, NodeHttpAddress:
> >> slave2.testing.local:24387,
> >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
> >>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
> >>>>> appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,705 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> >>>>> Command to launch container container_1465112239753_0001_03_000001 :
> >>>>> $JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
> >>>>> -Dlog4j.configuration=container-log4j.properties
> >>>>> -Dyarn.app.container.log.dir=<LOG_DIR>
> >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> >>>>> -Dhadoop.root.logfile=syslog  -Xmx1024m
> >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
> >>>>> 2><LOG_DIR>/stderr
> >>>>> 2016-06-05 07:39:25,706 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> >>>>> Create AMRMToken for ApplicationAttempt:
> >>>>> appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,707 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> >>>>> Creating password for appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,727 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
> >>>>> Done launching container Container: [ContainerId:
> >>>>> container_1465112239753_0001_03_000001, NodeId:
> >>>>> slave2.testing.local:26688, NodeHttpAddress:
> >> slave2.testing.local:24387,
> >>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
> >>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
> >>>>> appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:25,728 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> appattempt_1465112239753_0001_000003 State change from ALLOCATED to
> >>>> LAUNCHED
> >>>>> 2016-06-05 07:39:25,736 WARN
> >>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
> >>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
> >>>>> TASK_RUNNING
> >>>>> 2016-06-05 07:39:26,510 INFO
> org.apache.hadoop.yarn.util.RackResolver:
> >>>>> Resolved slave1.testing.local to /default-rack
> >>>>> 2016-06-05 07:39:26,517 WARN
> >>>>> org.apache.myriad.scheduler.fgs.NMHeartBeatHandler:
> FineGrainedScaling
> >>>>> feature got invoked for a NM with non-zero capacity. Host:
> >>>>> slave1.testing.local, Mem: 4096, CPU: 0. Setting the NM's capacity to
> >>>>> (0G,0CPU)
> >>>>> 2016-06-05 07:39:26,517 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
> >>>>> slave1.testing.local:29121 Node Transitioned from NEW to RUNNING
> >>>>> 2016-06-05 07:39:26,518 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> >>>>> Added node slave1.testing.local:29121 cluster capacity: <memory:4096,
> >>>>> vCores:1>
> >>>>> 2016-06-05 07:39:26,519 INFO
> >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager:
> >>>>> afterSchedulerEventHandled: NM registration from node
> >>>> slave1.testing.local
> >>>>> 2016-06-05 07:39:26,528 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
> >>>>> received container statuses on node manager register :[container_id {
> >>>>> app_attempt_id { application_id { id: 1 cluster_timestamp:
> >> 1465112239753
> >>>>> } attemptId: 2 } id: 1 } container_state: C_RUNNING resource {
> memory:
> >> 0
> >>>>> virtual_cores: 0 } priority { priority: 0 } diagnostics: ""
> >>>>> container_exit_status: -1000 creation_time: 1465112356478]
> >>>>> 2016-06-05 07:39:26,530 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
> >>>>> NodeManager from node slave1.testing.local(cmPort: 29121 httpPort:
> >>>>> 20456) registered with capability: <memory:0, vCores:0>, assigned
> >> nodeId
> >>>>> slave1.testing.local:29121
> >>>>> 2016-06-05 07:39:26,611 INFO
> >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
> >>>>> capacity for node slave1.testing.local to <memory:4637, vCores:6>
> >>>>> 2016-06-05 07:39:26,611 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> >>>>> Update resource on node: slave1.testing.local from: <memory:0,
> >>>>> vCores:0>, to: <memory:4637, vCores:6>
> >>>>> 2016-06-05 07:39:26,615 INFO
> >>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
> >>>>> capacity for node slave1.testing.local to <memory:0, vCores:0>
> >>>>> 2016-06-05 07:39:26,616 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> >>>>> Update resource on node: slave1.testing.local from: <memory:4637,
> >>>>> vCores:6>, to: <memory:0, vCores:0>
> >>>>> 2016-06-05 07:39:26,691 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> >>>>> container_1465112239753_0001_03_000001 Container Transitioned from
> >>>>> ACQUIRED to RUNNING
> >>>>> 2016-06-05 07:39:26,835 WARN
> >>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
> >>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
> >>>>> TASK_FINISHED
> >>>>> 2016-06-05 07:39:27,603 INFO
> >>>>>
> org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler:
> >>>>> Received offers 1
> >>>>> 2016-06-05 07:39:27,748 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> >>>>> container_1465112239753_0001_03_000001 Container Transitioned from
> >>>>> RUNNING to COMPLETED
> >>>>> 2016-06-05 07:39:27,748 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
> >>>>> Completed container: container_1465112239753_0001_03_000001 in state:
> >>>>> COMPLETED event:FINISHED
> >>>>> 2016-06-05 07:39:27,748 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
> USER=root
> >>>>>        OPERATION=AM Released Container TARGET=SchedulerApp
> >>>>> RESULT=SUCCESS  APPID=application_1465112239753_0001
> >>>>> CONTAINERID=container_1465112239753_0001_03_000001
> >>>>> 2016-06-05 07:39:27,748 INFO
> >>>>>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> >>>>> Released container container_1465112239753_0001_03_000001 of capacity
> >>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which
> >> currently
> >>>>> has 0 containers, <memory:0, vCores:0> used and <memory:4096,
> vCores:1>
> >>>>> available, release resources=true
> >>>>> 2016-06-05 07:39:27,748 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> >>>>> Application attempt appattempt_1465112239753_0001_000003 released
> >>>>> container container_1465112239753_0001_03_000001 on node: host:
> >>>>> slave2.testing.local:26688 #containers=0 available=<memory:4096,
> >>>>> vCores:1> used=<memory:0, vCores:0> with event: FINISHED
> >>>>> 2016-06-05 07:39:27,749 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> Updating application attempt appattempt_1465112239753_0001_000003
> with
> >>>>> final state: FAILED, and exit status: -103
> >>>>> 2016-06-05 07:39:27,750 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> appattempt_1465112239753_0001_000003 State change from LAUNCHED to
> >>>>> FINAL_SAVING
> >>>>> 2016-06-05 07:39:27,751 INFO
> >>>>>
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> >>>>> Unregistering app attempt : appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:27,751 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> >>>>> Application finished, removing password for
> >>>>> appattempt_1465112239753_0001_000003
> >>>>> 2016-06-05 07:39:27,751 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> >>>>> appattempt_1465112239753_0001_000003 State change from FINAL_SAVING
> to
> >>>>> FAILED
> >>>>> 2016-06-05 07:39:27,751 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
> >>>>> number of failed attempts is 2. The max attempts is 2
> >>>>> 2016-06-05 07:39:27,753 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Updating
> >>>>> application application_1465112239753_0001 with final state: FAILED
> >>>>> 2016-06-05 07:39:27,756 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> >>>>> application_1465112239753_0001 State change from ACCEPTED to
> >> FINAL_SAVING
> >>>>> 2016-06-05 07:39:27,757 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> >>>>> Application appattempt_1465112239753_0001_000003 is done.
> >>>> finalState=FAILED
> >>>>> 2016-06-05 07:39:27,757 INFO
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> >>>>> Application application_1465112239753_0001 requests cleared
> >>>>> 2016-06-05 07:39:27,758 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> >>>>> Updating info for app: application_1465112239753_0001
> >>>>> 2016-06-05 07:39:27,758 INFO
> >>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> >>>>> Application application_1465112239753_0001 failed 2 times due to AM
> >>>>> Container for appattempt_1465112239753_0001_000003 exited with
> >>>>> exitCode: -103
> >>>>> For more detailed output, check application tracking
> >>>>> page:
> >>>>>
> >>>>
> >>
> http://master.testing.local:8088/cluster/app/application_1465112239753_0001Then
> >>>>> ,
> >>>>> click on links to logs of each attempt.
> >>>>> Diagnostics: Container
> >>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
> >> running
> >>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
> >>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container.
> >>>>> Dump of the process-tree for container_1465112239753_0001_03_000001 :
> >>>>>             |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> >>>>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
> FULL_CMD_LINE
> >>>>>             |- 3873 3865 3865 3865 (java) 80 26 2770927616 12614
> >>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
> >>>>>
> >>>>
> >>
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
> >>>>> -Dlog4j.configuration=container-log4j.properties
> >>>>>
> >>>>
> >>
> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
> >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> >>>>> -Dhadoop.root.logfile=syslog -Xmx1024m
> >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> >>>>>             |- 3865 3863 3865 3865 (bash) 0 1 11427840 354 /bin/bash
> -c
> >>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
> >>>>>
> >>>>
> >>
> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
> >>>>> -Dlog4j.configuration=container-log4j.properties
> >>>>>
> >>>>
> >>
> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
> >>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> >>>>> -Dhadoop.root.logfile=syslog  -Xmx1024m
> >>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> >>>>>
> >>>>
> >>
> 1>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stdout
> >>>>>
> >>>>
> >>
> 2>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stderr
> >>>>>
> >>>>>
> >>>>> Container killed on request. Exit code is 143
> >>>>> Container exited with a non-zero exit code 143
> >>>>> Failing this attempt. Failing the application.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 03/06/16 15:52, yuliya Feldman wrote:
> >>>>>> I believe you need at least one NM that is not subject to fine grain
> >>>>> scaling.
> >>>>>> So far if total resources on the cluster is less then a single
> >> container
> >>>>> needs for AM you won't be able to submit any app.As exception below
> >> tells
> >>>>> you.
> >>>>>> (Invalid resource request, requested memory < 0, or requested memory
> >>>>> max
> >>>>> configured, requestedMemory=1536, maxMemory=0
> >>>>>>             at)
> >>>>>> I believe by default when starting Myriad cluster one NM with non 0
> >>>>> capacity should start by default.
> >>>>>> In addition see in RM log whether offers with resources are coming
> to
> >> RM
> >>>>> - this info should be in the log.
> >>>>>>
> >>>>>>           From: Stephen Gran <stephen.g...@piksel.com>
> >>>>>>      To: "dev@myriad.incubator.apache.org" <
> >>>> dev@myriad.incubator.apache.org>
> >>>>>>      Sent: Friday, June 3, 2016 1:29 AM
> >>>>>>      Subject: problem getting fine grained scaling workig
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm trying to get fine grained scaling going on a test mesos
> >> cluster.  I
> >>>>>> have a single master and 2 agents.  I am running 2 node managers
> with
> >>>>>> the zero profile, one per agent.  I can see both of them in the RM
> UI
> >>>>>> reporting correctly as having 0 resources.
> >>>>>>
> >>>>>> I'm getting stack traces when I try to launch a sample application,
> >>>>>> though.  I feel like I'm just missing something obvious somewhere -
> >> can
> >>>>>> anyone shed any light?
> >>>>>>
> >>>>>> This is on a build of yesterday's git head.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> root@master:/srv/apps/hadoop# bin/yarn jar
> >>>>>> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen
> >> 10000
> >>>>>> /outDir
> >>>>>> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager
> >> at
> >>>>>> master.testing.local/10.0.5.3:8032
> >>>>>> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 10000 using 2
> >>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
> >>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
> >>>>>> job: job_1464902078156_0001
> >>>>>> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the
> staging
> >>>>>> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
> >>>>>> java.io.IOException:
> >>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
> >>>>>> Invalid resource request, requested memory < 0, or requested memory
> >
> >>>>>> max configured, requestedMemory=1536, maxMemory=0
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> >>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> >>>>>>             at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >>>>>>             at
> >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> >>>>>>
> >>>>>>             at
> >>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
> >>>>>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> >>>>>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> >>>>>>             at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >>>>>>             at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> >>>>>>             at
> >>>>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> >>>>>>             at
> >>>>> org.apache.hadoop.examples.terasort.TeraGen.run(TeraGen.java:301)
> >>>>>>             at
> >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>>>>>             at
> >>>>> org.apache.hadoop.examples.terasort.TeraGen.main(TeraGen.java:305)
> >>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> >>>>>>             at
> >>>>> org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> >>>>>>             at
> >>>>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> >>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
> >>>>>>             at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> >>>>>>             at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> >>>>>> Caused by:
> >>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
> >>>>>> Invalid resource request, requested memory < 0, or requested memory
> >
> >>>>>> max configured, requestedMemory=1536, maxMemory=0
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> >>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> >>>>>>             at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >>>>>>             at
> >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> >>>>>>
> >>>>>>             at
> >>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> >>>>> Method)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >>>>>>             at
> >>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> >>>>>>             at
> >>>>>
> >> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:239)
> >>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> >>>>>>             at com.sun.proxy.$Proxy13.submitApplication(Unknown
> Source)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:253)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
> >>>>>>             at
> >>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
> >>>>>>             ... 24 more
> >>>>>> Caused by:
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
> >>>>>> Invalid resource request, requested memory < 0, or requested memory
> >
> >>>>>> max configured, requestedMemory=1536, maxMemory=0
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> >>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> >>>>>>             at
> >>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> >>>>>>             at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >>>>>>             at
> >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> >>>>>>
> >>>>>>             at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> >>>>>>             at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> >>>>>>             at com.sun.proxy.$Proxy12.submitApplication(Unknown
> Source)
> >>>>>>             at
> >>>>>>
> >>>>>
> >>>>
> >>
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:236)
> >>>>>>             ... 34 more
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> --
> >>>>>> Stephen Gran
> >>>>>> Senior Technical Architect
> >>>>>>
> >>>>>> picture the possibilities | piksel.com
> >>>>>> This message is private and confidential. If you have received this
> >>>>> message in error, please notify the sender or serviced...@piksel.com
> >> and
> >>>>> remove it from your system.
> >>>>>>
> >>>>>> Piksel Inc is a company registered in the United States New York
> City,
> >>>>> 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Stephen Gran
> >>>>> Senior Technical Architect
> >>>>>
> >>>>> picture the possibilities | piksel.com
> >>>>>
> >>>>
> >>>> --
> >>>> Stephen Gran
> >>>> Senior Technical Architect
> >>>>
> >>>> picture the possibilities | piksel.com
> >>>>
> >>>
> >>
> >> --
> >> Stephen Gran
> >> Senior Technical Architect
> >>
> >> picture the possibilities | piksel.com
> >>
> >
>
> --
> Stephen Gran
> Senior Technical Architect
>
> picture the possibilities | piksel.com
>
>

Re: problem getting fine grained scaling workig

Reply via email to