Re: problem getting fine grained scaling workig

Darin Johnson Sun, 05 Jun 2016 10:28:09 -0700

I'd also recommend you clone master or use the 0.2.0 release candidate:
https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc4/
at it fixes a number of FGS specific bugs.


Darin

On Sun, Jun 5, 2016 at 1:09 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Stephen,
>
> I was able to recreate the problem (specific due to 2.7.2, they changed
> the defaults on the following two properties to true).  Setting them to
> false allowed me to again run map reduce jobs.  I'll try to update the
> documentation later today.
>
>   <property>
>
>     <name>yarn.nodemanager.pmem-check-enabled</name>
>
>     <value>false</value>
>
>   </property>
>
>   <property>
>
>     <name>yarn.nodemanager.vmem-check-enabled</name>
>
>     <value>false</value>
>
>   </property>
>
> Darin
>
> On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com>
> wrote:
>
>> Hi,
>>
>> I think those are the properties I added when I started getting this
>> error.  Removing them doesn't seem to make any difference, sadly.
>>
>> This is hadoop 2.7.2
>>
>> Cheers,
>>
>> On 05/06/16 14:45, Darin Johnson wrote:
>> > Hey Stephen,
>> >
>> > I think you're pretty close.
>> >
>> > Looking at the config I'd suggest removing these properties:
>> >
>> >     <property>
>> >          <name>yarn.nodemanager.resource.memory-mb</name>
>> >          <value>4096</value>
>> >      </property>
>> >      <property>
>> >          <name>yarn.scheduler.maximum-allocation-vcores</name>
>> >          <value>12</value>
>> >      </property>
>> >      <property>
>> >          <name>yarn.scheduler.maximum-allocation-mb</name>
>> >          <value>8192</value>
>> >      </property>
>> >    <property>
>> >     <name>yarn.nodemanager.vmem-check-enabled</name>
>> >      <value>false</value>
>> >      <description>Whether virtual memory limits will be enforced for
>> > containers</description>
>> >    </property>
>> > <property>
>> >     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>> >      <value>4</value>
>> >      <description>Ratio between virtual memory to physical memory when
>> > setting memory limits for containers</description>
>> >    </property>
>> >
>> > I'll try them out on my test cluster later today/tonight and see if I
>> can
>> > recreate the problem.  What version of hadoop are you running?  I'll
>> make
>> > sure I'm consistent with that as well.
>> >
>> > Thanks,
>> >
>> > Darin
>> > On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Attached.  Thanks very much for looking.
>> >>
>> >> Cheers,
>> >>
>> >> On 05/06/16 12:51, Darin Johnson wrote:
>> >>> Hey Steven can you please send your yarn-site.xml, I'm guessing
>> you're on
>> >>> the right track.
>> >>>
>> >>> Darin
>> >>> Hi,
>> >>>
>> >>> OK.  That helps, thank you.  I think I just misunderstood the docs (or
>> >>> they never said explicitly that you did need at least some static
>> >>> resource), and I scaled down the initial nm.medium that got started.
>> I
>> >>> get a bit further now, and jobs start but are killed with:
>> >>>
>> >>> Diagnostics: Container
>> >>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
>> running
>> >>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
>> >>> memory used; 2.6 GB of 0B virtual memory used. Killing container
>> >>>
>> >>> When I've seen this in the past with yarn but without myriad, it was
>> >>> usually about ratios of vmem to mem and things like that - I've tried
>> >>> some of those knobs, but I didn't expect much result and didn't get
>> any.
>> >>>
>> >>> What strikes me about the error message is that the vmem and mem
>> >>> allocations are for 0.
>> >>>
>> >>> I'm sorry for asking what are probably naive questions here, I
>> couldn't
>> >>> find a different forum.  If there is one, please point me there so I
>> >>> don't disrupt the dev flow here.
>> >>>
>> >>> I can see this in the logs:
>> >>>
>> >>>
>> >>> 2016-06-05 07:39:25,687 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>> >>> container_1465112239753_0001_03_000001 Container Transitioned from NEW
>> >>> to ALLOCATED
>> >>> 2016-06-05 07:39:25,688 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
>> >>>       OPERATION=AM Allocated Container        TARGET=SchedulerApp
>> >>> RESULT=SUCCESS  APPID=application_1465112239753_0001
>> >>> CONTAINERID=container_1465112239753_0001_03_000001
>> >>> 2016-06-05 07:39:25,688 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
>> >>> Assigned container container_1465112239753_0001_03_000001 of capacity
>> >>> <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1
>> >>> containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
>> >>> available after allocation
>> >>> 2016-06-05 07:39:25,689 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
>> >>> Sending NMToken for nodeId : slave2.testing.local:26688 for container
>> :
>> >>> container_1465112239753_0001_03_000001
>> >>> 2016-06-05 07:39:25,696 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>> >>> container_1465112239753_0001_03_000001 Container Transitioned from
>> >>> ALLOCATED to ACQUIRED
>> >>> 2016-06-05 07:39:25,696 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
>> >>> Clear node set for appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,696 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> Storing attempt: AppId: application_1465112239753_0001 AttemptId:
>> >>> appattempt_1465112239753_0001_000003 MasterContainer: Container:
>> >>> [ContainerId: container_1465112239753_0001_03_000001, NodeId:
>> >>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>> >>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>> >>> ContainerToken, service: 10.0.5.5:26688 }, ]
>> >>> 2016-06-05 07:39:25,697 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> appattempt_1465112239753_0001_000003 State change from SCHEDULED to
>> >>> ALLOCATED_SAVING
>> >>> 2016-06-05 07:39:25,698 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> appattempt_1465112239753_0001_000003 State change from
>> ALLOCATED_SAVING
>> >>> to ALLOCATED
>> >>> 2016-06-05 07:39:25,699 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>> >>> Launching masterappattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,705 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>> >>> Setting up container Container: [ContainerId:
>> >>> container_1465112239753_0001_03_000001, NodeId:
>> >>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>> >>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>> >>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
>> >>> appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,705 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>> >>> Command to launch container container_1465112239753_0001_03_000001 :
>> >>> $JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
>> >>> -Dlog4j.configuration=container-log4j.properties
>> >>> -Dyarn.app.container.log.dir=<LOG_DIR>
>> >>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> >>> -Dhadoop.root.logfile=syslog  -Xmx1024m
>> >>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
>> >>> 2><LOG_DIR>/stderr
>> >>> 2016-06-05 07:39:25,706 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>> >>> Create AMRMToken for ApplicationAttempt:
>> >>> appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,707 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>> >>> Creating password for appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,727 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>> >>> Done launching container Container: [ContainerId:
>> >>> container_1465112239753_0001_03_000001, NodeId:
>> >>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>> >>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>> >>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
>> >>> appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:25,728 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> appattempt_1465112239753_0001_000003 State change from ALLOCATED to
>> >> LAUNCHED
>> >>> 2016-06-05 07:39:25,736 WARN
>> >>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
>> >>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
>> >>> TASK_RUNNING
>> >>> 2016-06-05 07:39:26,510 INFO org.apache.hadoop.yarn.util.RackResolver:
>> >>> Resolved slave1.testing.local to /default-rack
>> >>> 2016-06-05 07:39:26,517 WARN
>> >>> org.apache.myriad.scheduler.fgs.NMHeartBeatHandler: FineGrainedScaling
>> >>> feature got invoked for a NM with non-zero capacity. Host:
>> >>> slave1.testing.local, Mem: 4096, CPU: 0. Setting the NM's capacity to
>> >>> (0G,0CPU)
>> >>> 2016-06-05 07:39:26,517 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
>> >>> slave1.testing.local:29121 Node Transitioned from NEW to RUNNING
>> >>> 2016-06-05 07:39:26,518 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>> >>> Added node slave1.testing.local:29121 cluster capacity: <memory:4096,
>> >>> vCores:1>
>> >>> 2016-06-05 07:39:26,519 INFO
>> >>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager:
>> >>> afterSchedulerEventHandled: NM registration from node
>> >> slave1.testing.local
>> >>> 2016-06-05 07:39:26,528 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
>> >>> received container statuses on node manager register :[container_id {
>> >>> app_attempt_id { application_id { id: 1 cluster_timestamp:
>> 1465112239753
>> >>> } attemptId: 2 } id: 1 } container_state: C_RUNNING resource {
>> memory: 0
>> >>> virtual_cores: 0 } priority { priority: 0 } diagnostics: ""
>> >>> container_exit_status: -1000 creation_time: 1465112356478]
>> >>> 2016-06-05 07:39:26,530 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
>> >>> NodeManager from node slave1.testing.local(cmPort: 29121 httpPort:
>> >>> 20456) registered with capability: <memory:0, vCores:0>, assigned
>> nodeId
>> >>> slave1.testing.local:29121
>> >>> 2016-06-05 07:39:26,611 INFO
>> >>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
>> >>> capacity for node slave1.testing.local to <memory:4637, vCores:6>
>> >>> 2016-06-05 07:39:26,611 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>> >>> Update resource on node: slave1.testing.local from: <memory:0,
>> >>> vCores:0>, to: <memory:4637, vCores:6>
>> >>> 2016-06-05 07:39:26,615 INFO
>> >>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
>> >>> capacity for node slave1.testing.local to <memory:0, vCores:0>
>> >>> 2016-06-05 07:39:26,616 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>> >>> Update resource on node: slave1.testing.local from: <memory:4637,
>> >>> vCores:6>, to: <memory:0, vCores:0>
>> >>> 2016-06-05 07:39:26,691 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>> >>> container_1465112239753_0001_03_000001 Container Transitioned from
>> >>> ACQUIRED to RUNNING
>> >>> 2016-06-05 07:39:26,835 WARN
>> >>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
>> >>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
>> >>> TASK_FINISHED
>> >>> 2016-06-05 07:39:27,603 INFO
>> >>> org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler:
>> >>> Received offers 1
>> >>> 2016-06-05 07:39:27,748 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>> >>> container_1465112239753_0001_03_000001 Container Transitioned from
>> >>> RUNNING to COMPLETED
>> >>> 2016-06-05 07:39:27,748 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
>> >>> Completed container: container_1465112239753_0001_03_000001 in state:
>> >>> COMPLETED event:FINISHED
>> >>> 2016-06-05 07:39:27,748 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
>> >>>       OPERATION=AM Released Container TARGET=SchedulerApp
>> >>> RESULT=SUCCESS  APPID=application_1465112239753_0001
>> >>> CONTAINERID=container_1465112239753_0001_03_000001
>> >>> 2016-06-05 07:39:27,748 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
>> >>> Released container container_1465112239753_0001_03_000001 of capacity
>> >>> <memory:0, vCores:0> on host slave2.testing.local:26688, which
>> currently
>> >>> has 0 containers, <memory:0, vCores:0> used and <memory:4096,
>> vCores:1>
>> >>> available, release resources=true
>> >>> 2016-06-05 07:39:27,748 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>> >>> Application attempt appattempt_1465112239753_0001_000003 released
>> >>> container container_1465112239753_0001_03_000001 on node: host:
>> >>> slave2.testing.local:26688 #containers=0 available=<memory:4096,
>> >>> vCores:1> used=<memory:0, vCores:0> with event: FINISHED
>> >>> 2016-06-05 07:39:27,749 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> Updating application attempt appattempt_1465112239753_0001_000003 with
>> >>> final state: FAILED, and exit status: -103
>> >>> 2016-06-05 07:39:27,750 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> appattempt_1465112239753_0001_000003 State change from LAUNCHED to
>> >>> FINAL_SAVING
>> >>> 2016-06-05 07:39:27,751 INFO
>> >>>
>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
>> >>> Unregistering app attempt : appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:27,751 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>> >>> Application finished, removing password for
>> >>> appattempt_1465112239753_0001_000003
>> >>> 2016-06-05 07:39:27,751 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>> >>> appattempt_1465112239753_0001_000003 State change from FINAL_SAVING to
>> >>> FAILED
>> >>> 2016-06-05 07:39:27,751 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
>> >>> number of failed attempts is 2. The max attempts is 2
>> >>> 2016-06-05 07:39:27,753 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> Updating
>> >>> application application_1465112239753_0001 with final state: FAILED
>> >>> 2016-06-05 07:39:27,756 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> >>> application_1465112239753_0001 State change from ACCEPTED to
>> FINAL_SAVING
>> >>> 2016-06-05 07:39:27,757 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>> >>> Application appattempt_1465112239753_0001_000003 is done.
>> >> finalState=FAILED
>> >>> 2016-06-05 07:39:27,757 INFO
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
>> >>> Application application_1465112239753_0001 requests cleared
>> >>> 2016-06-05 07:39:27,758 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
>> >>> Updating info for app: application_1465112239753_0001
>> >>> 2016-06-05 07:39:27,758 INFO
>> >>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>> >>> Application application_1465112239753_0001 failed 2 times due to AM
>> >>> Container for appattempt_1465112239753_0001_000003 exited with
>> >>> exitCode: -103
>> >>> For more detailed output, check application tracking
>> >>> page:
>> >>>
>> >>
>> http://master.testing.local:8088/cluster/app/application_1465112239753_0001Then
>> >>> ,
>> >>> click on links to logs of each attempt.
>> >>> Diagnostics: Container
>> >>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
>> running
>> >>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
>> >>> memory used; 2.6 GB of 0B virtual memory used. Killing container.
>> >>> Dump of the process-tree for container_1465112239753_0001_03_000001 :
>> >>>            |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> >>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
>> FULL_CMD_LINE
>> >>>            |- 3873 3865 3865 3865 (java) 80 26 2770927616 12614
>> >>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
>> >>>
>> >>
>> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
>> >>> -Dlog4j.configuration=container-log4j.properties
>> >>>
>> >>
>> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
>> >>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> >>> -Dhadoop.root.logfile=syslog -Xmx1024m
>> >>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
>> >>>            |- 3865 3863 3865 3865 (bash) 0 1 11427840 354 /bin/bash -c
>> >>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
>> >>>
>> >>
>> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
>> >>> -Dlog4j.configuration=container-log4j.properties
>> >>>
>> >>
>> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
>> >>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> >>> -Dhadoop.root.logfile=syslog  -Xmx1024m
>> >>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
>> >>>
>> >>
>> 1>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stdout
>> >>>
>> >>
>> 2>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stderr
>> >>>
>> >>>
>> >>> Container killed on request. Exit code is 143
>> >>> Container exited with a non-zero exit code 143
>> >>> Failing this attempt. Failing the application.
>> >>>
>> >>>
>> >>>
>> >>> On 03/06/16 15:52, yuliya Feldman wrote:
>> >>>> I believe you need at least one NM that is not subject to fine grain
>> >>> scaling.
>> >>>> So far if total resources on the cluster is less then a single
>> container
>> >>> needs for AM you won't be able to submit any app.As exception below
>> tells
>> >>> you.
>> >>>> (Invalid resource request, requested memory < 0, or requested memory
>> >>> max
>> >>> configured, requestedMemory=1536, maxMemory=0
>> >>>>            at)
>> >>>> I believe by default when starting Myriad cluster one NM with non 0
>> >>> capacity should start by default.
>> >>>> In addition see in RM log whether offers with resources are coming
>> to RM
>> >>> - this info should be in the log.
>> >>>>
>> >>>>          From: Stephen Gran <stephen.g...@piksel.com>
>> >>>>     To: "dev@myriad.incubator.apache.org" <
>> >> dev@myriad.incubator.apache.org>
>> >>>>     Sent: Friday, June 3, 2016 1:29 AM
>> >>>>     Subject: problem getting fine grained scaling workig
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I'm trying to get fine grained scaling going on a test mesos
>> cluster.  I
>> >>>> have a single master and 2 agents.  I am running 2 node managers with
>> >>>> the zero profile, one per agent.  I can see both of them in the RM UI
>> >>>> reporting correctly as having 0 resources.
>> >>>>
>> >>>> I'm getting stack traces when I try to launch a sample application,
>> >>>> though.  I feel like I'm just missing something obvious somewhere -
>> can
>> >>>> anyone shed any light?
>> >>>>
>> >>>> This is on a build of yesterday's git head.
>> >>>>
>> >>>> Cheers,
>> >>>>
>> >>>> root@master:/srv/apps/hadoop# bin/yarn jar
>> >>>> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen
>> 10000
>> >>>> /outDir
>> >>>> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager
>> at
>> >>>> master.testing.local/10.0.5.3:8032
>> >>>> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 10000 using 2
>> >>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
>> >>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
>> >>>> job: job_1464902078156_0001
>> >>>> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the
>> staging
>> >>>> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
>> >>>> java.io.IOException:
>> >>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
>> >>>> Invalid resource request, requested memory < 0, or requested memory >
>> >>>> max configured, requestedMemory=1536, maxMemory=0
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> >>>>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> >>>>            at java.security.AccessController.doPrivileged(Native
>> Method)
>> >>>>            at javax.security.auth.Subject.doAs(Subject.java:422)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >>>>            at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> >>>>
>> >>>>            at
>> >>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>> >>>>            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>> >>>>            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>> >>>>            at java.security.AccessController.doPrivileged(Native
>> Method)
>> >>>>            at javax.security.auth.Subject.doAs(Subject.java:422)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >>>>            at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>> >>>>            at
>> >>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>> >>>>            at
>> >>> org.apache.hadoop.examples.terasort.TeraGen.run(TeraGen.java:301)
>> >>>>            at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>>>            at
>> >>> org.apache.hadoop.examples.terasort.TeraGen.main(TeraGen.java:305)
>> >>>>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>>>            at java.lang.reflect.Method.invoke(Method.java:497)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>> >>>>            at
>> >>> org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>> >>>>            at
>> >>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>> >>>>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>>>            at java.lang.reflect.Method.invoke(Method.java:497)
>> >>>>            at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>> >>>>            at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>> >>>> Caused by:
>> >>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
>> >>>> Invalid resource request, requested memory < 0, or requested memory >
>> >>>> max configured, requestedMemory=1536, maxMemory=0
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> >>>>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> >>>>            at java.security.AccessController.doPrivileged(Native
>> Method)
>> >>>>            at javax.security.auth.Subject.doAs(Subject.java:422)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >>>>            at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> >>>>
>> >>>>            at
>> >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> >>> Method)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> >>>>            at
>> >>> java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>> >>>>            at
>> >>>
>> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:239)
>> >>>>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>>>            at java.lang.reflect.Method.invoke(Method.java:497)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> >>>>            at com.sun.proxy.$Proxy13.submitApplication(Unknown
>> Source)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:253)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
>> >>>>            at
>> >>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>> >>>>            ... 24 more
>> >>>> Caused by:
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>> >>>> Invalid resource request, requested memory < 0, or requested memory >
>> >>>> max configured, requestedMemory=1536, maxMemory=0
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> >>>>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> >>>>            at
>> >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> >>>>            at java.security.AccessController.doPrivileged(Native
>> Method)
>> >>>>            at javax.security.auth.Subject.doAs(Subject.java:422)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> >>>>            at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> >>>>
>> >>>>            at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>> >>>>            at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>> >>>>            at com.sun.proxy.$Proxy12.submitApplication(Unknown
>> Source)
>> >>>>            at
>> >>>>
>> >>>
>> >>
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:236)
>> >>>>            ... 34 more
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>> --
>> >>>> Stephen Gran
>> >>>> Senior Technical Architect
>> >>>>
>> >>>> picture the possibilities | piksel.com
>> >>>> This message is private and confidential. If you have received this
>> >>> message in error, please notify the sender or serviced...@piksel.com
>> and
>> >>> remove it from your system.
>> >>>>
>> >>>> Piksel Inc is a company registered in the United States New York
>> City,
>> >>> 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Stephen Gran
>> >>> Senior Technical Architect
>> >>>
>> >>> picture the possibilities | piksel.com
>> >>>
>> >>
>> >> --
>> >> Stephen Gran
>> >> Senior Technical Architect
>> >>
>> >> picture the possibilities | piksel.com
>> >>
>> >
>>
>> --
>> Stephen Gran
>> Senior Technical Architect
>>
>> picture the possibilities | piksel.com
>>
>
>

Re: problem getting fine grained scaling workig

Reply via email to