Re: new to hadoop, jobs never leaving accepted

2019-07-10 Thread Sunil Govindan
Hi Jason
Default value of
yarn.scheduler.capacity..maximum-am-resource-percent is 0.1
(which is 10% of resource of the queue).
If you want to run more apps in that queue, you need to increase this limit
to run Application Master's for the apps.
Please make sure this is not configured to a higher value as you may end up
in having too many master's and less workers.

Thanks
Sunil





On Wed, Jul 10, 2019 at 11:11 PM Jason Laughman <
ja...@bernetechconsulting.com> wrote:

> I added a couple of bigger servers and now I see multiple containers
> running, but I still can’t get a job to run.  The job details now say:
>
> Diagnostics:[Wed Jul 10 17:27:59 + 2019] Application is added to
> the scheduler and is not yet activated. Queue's AM resource limit exceeded.
> Details : AM Partition = ; AM Resource Request =
> ; Queue Resource Limit for AM =  vCores:1>; User AM Resource Limit of the queue = ;
> Queue AM Resource Usage = ;
>
> I understand WHAT that’s saying, but I don’t understand WHY.  Here’s what
> my scheduler details look like, I don’t see why it’s complaining about the
> AM, unless something’s not talking to something else right:
>
> Queue State:RUNNING
> Used Capacity:  12.5%
> Configured Capacity:100.0%
> Configured Max Capacity:100.0%
> Absolute Used Capacity: 12.5%
> Absolute Configured Capacity:   100.0%
> Absolute Configured Max Capacity:   100.0%
> Used Resources: 
> Configured Max Application Master Limit:10.0
> Max Application Master Resources:   
> Used Application Master Resources:  
> Max Application Master Resources Per User:  
> Num Schedulable Applications:   3
> Num Non-Schedulable Applications:   38
> Num Containers: 3
> Max Applications:   1
> Max Applications Per User:  1
> Configured Minimum User Limit Percent:  100%
> Configured User Limit Factor:   1.0
> Accessible Node Labels: *
> Ordering Policy:FifoOrderingPolicy
> Preemption: disabled
> Intra-queue Preemption: disabled
> Default Node Label Expression:  
> Default Application Priority:   0
>
> User Name   Max ResourceWeight  Used Resource   Max AM Resource
> Used AM ResourceSchedulable AppsNon-Schedulable Apps
> hdfs1.0 
>  0   1
> dr.who  1.0 
>   3   37
>
> > On Jul 10, 2019, at 3:37 AM, yangtao.yt 
> wrote:
> >
> > Hi, Jason.
> >
> > According to the information you provided, your cluster has two nodes
> with the same resource , the single running
> container is AM container which already takes over .
> > I think a possible cause is that available resource of your cluster was
> insufficient for requesting new containers, please refer to the application
> attempt UI 
> (http://:/cluster/appattempt/),
> you can find outstanding requests with required resources over there.
> Another possible cause is the queue/user limit, you can refer to scheduler
> UI (http://:/cluster/scheduler) to check resource
> quotas and usage of the queue.
> > Hope it helps.
> >
> > Best,
> > Tao Yang
> >
> >> 在 2019年7月10日,上午8:23,Jason Laughman  写道:
> >>
> >> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating
> through HDFS, but when I try to run a job via Hive (I see that it’s
> deprecated, but it’s what I’m working with for now) it never gets out of
> accepted state in the web tool.  I’ve done some Googling and the general
> consensus is that it’s resource constraints, so can someone tell me if I’ve
> got enough horsepower here?
> >>
> >> I’ve got one small name server, three small data servers, and two
> larger data servers.  I figured out the the small data servers were too
> small because even if I tried to tweak YARN parameters for RAM and CPU the
> resource managers would immediately shutdown.  I added the two larger data
> servers, and now I see two active nodes but only with a total of one
> container:
> >>
> >> $ yarn node -list
> >> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at
> :8032
> >> Total Nodes:2
> >>Node-IdNode-State Node-Http-Address
>  Number-of-Running-Containers
> >> node1:40079  RUNNING node1:8042
>  1
> >> node2:36311  RUNNING node2:8042
>  0
> >>
> >> There are a ton of some sort of automated jobs backed up on there, and
> when I try to run anything through Hive it just sits there and eventually
> times out (I do see it get accepted).  My larger nodes are 4 GB RAM and 2
> vcores and I set YARN to do automated resource allocation with
> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to
> even get a POC lab working?  I don’t care about having the three smaller
> servers running as resource nodes, but I’d like to have a better
> understanding of what’s going on with the larger servers, because it seems
> like they’re close to working.
> >>
> >> Here’s the metrics data from the website, hopefully somebody can parse
> it.
> >> Cluster Metrics
> >> Apps Submitted   Apps PendingApps 

Re: new to hadoop, jobs never leaving accepted

2019-07-10 Thread Jason Laughman
I added a couple of bigger servers and now I see multiple containers running, 
but I still can’t get a job to run.  The job details now say:

Diagnostics:[Wed Jul 10 17:27:59 + 2019] Application is added to the 
scheduler and is not yet activated. Queue's AM resource limit exceeded. Details 
: AM Partition = ; AM Resource Request = ; Queue Resource Limit for AM = ; User AM 
Resource Limit of the queue = ; Queue AM Resource Usage 
= ;

I understand WHAT that’s saying, but I don’t understand WHY.  Here’s what my 
scheduler details look like, I don’t see why it’s complaining about the AM, 
unless something’s not talking to something else right:

Queue State:RUNNING
Used Capacity:  12.5%
Configured Capacity:100.0%
Configured Max Capacity:100.0%
Absolute Used Capacity: 12.5%
Absolute Configured Capacity:   100.0%
Absolute Configured Max Capacity:   100.0%
Used Resources: 
Configured Max Application Master Limit:10.0
Max Application Master Resources:   
Used Application Master Resources:  
Max Application Master Resources Per User:  
Num Schedulable Applications:   3
Num Non-Schedulable Applications:   38
Num Containers: 3
Max Applications:   1
Max Applications Per User:  1
Configured Minimum User Limit Percent:  100%
Configured User Limit Factor:   1.0
Accessible Node Labels: *
Ordering Policy:FifoOrderingPolicy
Preemption: disabled
Intra-queue Preemption: disabled
Default Node Label Expression:  
Default Application Priority:   0

User Name   Max ResourceWeight  Used Resource   Max AM Resource Used AM 
ResourceSchedulable AppsNon-Schedulable Apps
hdfs1.0  0   1
dr.who  1.0  
  3   37

> On Jul 10, 2019, at 3:37 AM, yangtao.yt  wrote:
> 
> Hi, Jason.
> 
> According to the information you provided, your cluster has two nodes with 
> the same resource , the single running container is AM 
> container which already takes over .
> I think a possible cause is that available resource of your cluster was 
> insufficient for requesting new containers, please refer to the application 
> attempt UI 
> (http://:/cluster/appattempt/), you 
> can find outstanding requests with required resources over there. Another 
> possible cause is the queue/user limit, you can refer to scheduler UI 
> (http://:/cluster/scheduler) to check resource quotas 
> and usage of the queue.
> Hope it helps.
> 
> Best,
> Tao Yang
> 
>> 在 2019年7月10日,上午8:23,Jason Laughman  写道:
>> 
>> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating 
>> through HDFS, but when I try to run a job via Hive (I see that it’s 
>> deprecated, but it’s what I’m working with for now) it never gets out of 
>> accepted state in the web tool.  I’ve done some Googling and the general 
>> consensus is that it’s resource constraints, so can someone tell me if I’ve 
>> got enough horsepower here?
>> 
>> I’ve got one small name server, three small data servers, and two larger 
>> data servers.  I figured out the the small data servers were too small 
>> because even if I tried to tweak YARN parameters for RAM and CPU the 
>> resource managers would immediately shutdown.  I added the two larger data 
>> servers, and now I see two active nodes but only with a total of one 
>> container:
>> 
>> $ yarn node -list
>> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at 
>> :8032
>> Total Nodes:2
>>Node-IdNode-State Node-Http-Address   
>> Number-of-Running-Containers
>> node1:40079  RUNNING node1:8042 1
>> node2:36311  RUNNING node2:8042 0
>> 
>> There are a ton of some sort of automated jobs backed up on there, and when 
>> I try to run anything through Hive it just sits there and eventually times 
>> out (I do see it get accepted).  My larger nodes are 4 GB RAM and 2 vcores 
>> and I set YARN to do automated resource allocation with 
>> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to 
>> even get a POC lab working?  I don’t care about having the three smaller 
>> servers running as resource nodes, but I’d like to have a better 
>> understanding of what’s going on with the larger servers, because it seems 
>> like they’re close to working.
>> 
>> Here’s the metrics data from the website, hopefully somebody can parse it.
>> Cluster Metrics
>> Apps Submitted   Apps PendingApps RunningApps Completed  
>> Containers Running  Memory Used Memory TotalMemory Reserved 
>> VCores Used VCores TotalVCores Reserved
>> 292  284 1   7   1   1 GB3.38 GB 0 B 1   4   >> 0
>> Cluster Nodes Metrics
>> Active Nodes Decommissioning Nodes   Decommissioned NodesLost Nodes  
>> Unhealthy Nodes Rebooted Nodes  Shutdown Nodes
>> 20   0   0   0   0   4
>> Scheduler Metrics
>> Scheduler Type   Scheduling 

Re: new to hadoop, jobs never leaving accepted

2019-07-10 Thread yangtao.yt
Hi, Jason.

According to the information you provided, your cluster has two nodes with the 
same resource , the single running container is AM 
container which already takes over .
I think a possible cause is that available resource of your cluster was 
insufficient for requesting new containers, please refer to the application 
attempt UI 
(http://:/cluster/appattempt/), you can 
find outstanding requests with required resources over there. Another possible 
cause is the queue/user limit, you can refer to scheduler UI 
(http://:/cluster/scheduler) to check resource quotas 
and usage of the queue.
Hope it helps.

Best,
Tao Yang

> 在 2019年7月10日,上午8:23,Jason Laughman  写道:
> 
> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating through 
> HDFS, but when I try to run a job via Hive (I see that it’s deprecated, but 
> it’s what I’m working with for now) it never gets out of accepted state in 
> the web tool.  I’ve done some Googling and the general consensus is that it’s 
> resource constraints, so can someone tell me if I’ve got enough horsepower 
> here?
> 
> I’ve got one small name server, three small data servers, and two larger data 
> servers.  I figured out the the small data servers were too small because 
> even if I tried to tweak YARN parameters for RAM and CPU the resource 
> managers would immediately shutdown.  I added the two larger data servers, 
> and now I see two active nodes but only with a total of one container:
> 
> $ yarn node -list
> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at 
> :8032
> Total Nodes:2
> Node-IdNode-State Node-Http-Address   
> Number-of-Running-Containers
> node1:40079   RUNNING node1:8042 1
> node2:36311   RUNNING node2:8042 0
> 
> There are a ton of some sort of automated jobs backed up on there, and when I 
> try to run anything through Hive it just sits there and eventually times out 
> (I do see it get accepted).  My larger nodes are 4 GB RAM and 2 vcores and I 
> set YARN to do automated resource allocation with 
> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to 
> even get a POC lab working?  I don’t care about having the three smaller 
> servers running as resource nodes, but I’d like to have a better 
> understanding of what’s going on with the larger servers, because it seems 
> like they’re close to working.
> 
> Here’s the metrics data from the website, hopefully somebody can parse it.
> Cluster Metrics
> Apps SubmittedApps PendingApps RunningApps Completed  
> Containers Running  Memory Used Memory TotalMemory Reserved 
> VCores Used VCores TotalVCores Reserved
> 292   284 1   7   1   1 GB3.38 GB 0 B 1   4   > 0
> Cluster Nodes Metrics
> Active Nodes  Decommissioning Nodes   Decommissioned NodesLost Nodes  
> Unhealthy Nodes Rebooted Nodes  Shutdown Nodes
> 2 0   0   0   0   0   4
> Scheduler Metrics
> Scheduler TypeScheduling Resource TypeMinimum Allocation  
> Maximum Allocation  Maximum Cluster Application Priority
> Capacity Scheduler[MEMORY]  vCores:2> 0
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org



smime.p7s
Description: S/MIME cryptographic signature