Re: new to hadoop, jobs never leaving accepted
Hi Jason Default value of yarn.scheduler.capacity..maximum-am-resource-percent is 0.1 (which is 10% of resource of the queue). If you want to run more apps in that queue, you need to increase this limit to run Application Master's for the apps. Please make sure this is not configured to a higher value as you may end up in having too many master's and less workers. Thanks Sunil On Wed, Jul 10, 2019 at 11:11 PM Jason Laughman < ja...@bernetechconsulting.com> wrote: > I added a couple of bigger servers and now I see multiple containers > running, but I still can’t get a job to run. The job details now say: > > Diagnostics:[Wed Jul 10 17:27:59 + 2019] Application is added to > the scheduler and is not yet activated. Queue's AM resource limit exceeded. > Details : AM Partition = ; AM Resource Request = > ; Queue Resource Limit for AM = vCores:1>; User AM Resource Limit of the queue = ; > Queue AM Resource Usage = ; > > I understand WHAT that’s saying, but I don’t understand WHY. Here’s what > my scheduler details look like, I don’t see why it’s complaining about the > AM, unless something’s not talking to something else right: > > Queue State:RUNNING > Used Capacity: 12.5% > Configured Capacity:100.0% > Configured Max Capacity:100.0% > Absolute Used Capacity: 12.5% > Absolute Configured Capacity: 100.0% > Absolute Configured Max Capacity: 100.0% > Used Resources: > Configured Max Application Master Limit:10.0 > Max Application Master Resources: > Used Application Master Resources: > Max Application Master Resources Per User: > Num Schedulable Applications: 3 > Num Non-Schedulable Applications: 38 > Num Containers: 3 > Max Applications: 1 > Max Applications Per User: 1 > Configured Minimum User Limit Percent: 100% > Configured User Limit Factor: 1.0 > Accessible Node Labels: * > Ordering Policy:FifoOrderingPolicy > Preemption: disabled > Intra-queue Preemption: disabled > Default Node Label Expression: > Default Application Priority: 0 > > User Name Max ResourceWeight Used Resource Max AM Resource > Used AM ResourceSchedulable AppsNon-Schedulable Apps > hdfs1.0 > 0 1 > dr.who 1.0 > 3 37 > > > On Jul 10, 2019, at 3:37 AM, yangtao.yt > wrote: > > > > Hi, Jason. > > > > According to the information you provided, your cluster has two nodes > with the same resource , the single running > container is AM container which already takes over . > > I think a possible cause is that available resource of your cluster was > insufficient for requesting new containers, please refer to the application > attempt UI > (http://:/cluster/appattempt/), > you can find outstanding requests with required resources over there. > Another possible cause is the queue/user limit, you can refer to scheduler > UI (http://:/cluster/scheduler) to check resource > quotas and usage of the queue. > > Hope it helps. > > > > Best, > > Tao Yang > > > >> 在 2019年7月10日,上午8:23,Jason Laughman 写道: > >> > >> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating > through HDFS, but when I try to run a job via Hive (I see that it’s > deprecated, but it’s what I’m working with for now) it never gets out of > accepted state in the web tool. I’ve done some Googling and the general > consensus is that it’s resource constraints, so can someone tell me if I’ve > got enough horsepower here? > >> > >> I’ve got one small name server, three small data servers, and two > larger data servers. I figured out the the small data servers were too > small because even if I tried to tweak YARN parameters for RAM and CPU the > resource managers would immediately shutdown. I added the two larger data > servers, and now I see two active nodes but only with a total of one > container: > >> > >> $ yarn node -list > >> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at > :8032 > >> Total Nodes:2 > >>Node-IdNode-State Node-Http-Address > Number-of-Running-Containers > >> node1:40079 RUNNING node1:8042 > 1 > >> node2:36311 RUNNING node2:8042 > 0 > >> > >> There are a ton of some sort of automated jobs backed up on there, and > when I try to run anything through Hive it just sits there and eventually > times out (I do see it get accepted). My larger nodes are 4 GB RAM and 2 > vcores and I set YARN to do automated resource allocation with > yarn.nodemanager.resource.detect-hardware-capabilities. Is that enough to > even get a POC lab working? I don’t care about having the three smaller > servers running as resource nodes, but I’d like to have a better > understanding of what’s going on with the larger servers, because it seems > like they’re close to working. > >> > >> Here’s the metrics data from the website, hopefully somebody can parse > it. > >> Cluster Metrics > >> Apps Submitted Apps PendingApps
Re: new to hadoop, jobs never leaving accepted
I added a couple of bigger servers and now I see multiple containers running, but I still can’t get a job to run. The job details now say: Diagnostics:[Wed Jul 10 17:27:59 + 2019] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = ; AM Resource Request = ; Queue Resource Limit for AM = ; User AM Resource Limit of the queue = ; Queue AM Resource Usage = ; I understand WHAT that’s saying, but I don’t understand WHY. Here’s what my scheduler details look like, I don’t see why it’s complaining about the AM, unless something’s not talking to something else right: Queue State:RUNNING Used Capacity: 12.5% Configured Capacity:100.0% Configured Max Capacity:100.0% Absolute Used Capacity: 12.5% Absolute Configured Capacity: 100.0% Absolute Configured Max Capacity: 100.0% Used Resources: Configured Max Application Master Limit:10.0 Max Application Master Resources: Used Application Master Resources: Max Application Master Resources Per User: Num Schedulable Applications: 3 Num Non-Schedulable Applications: 38 Num Containers: 3 Max Applications: 1 Max Applications Per User: 1 Configured Minimum User Limit Percent: 100% Configured User Limit Factor: 1.0 Accessible Node Labels: * Ordering Policy:FifoOrderingPolicy Preemption: disabled Intra-queue Preemption: disabled Default Node Label Expression: Default Application Priority: 0 User Name Max ResourceWeight Used Resource Max AM Resource Used AM ResourceSchedulable AppsNon-Schedulable Apps hdfs1.0 0 1 dr.who 1.0 3 37 > On Jul 10, 2019, at 3:37 AM, yangtao.yt wrote: > > Hi, Jason. > > According to the information you provided, your cluster has two nodes with > the same resource , the single running container is AM > container which already takes over . > I think a possible cause is that available resource of your cluster was > insufficient for requesting new containers, please refer to the application > attempt UI > (http://:/cluster/appattempt/), you > can find outstanding requests with required resources over there. Another > possible cause is the queue/user limit, you can refer to scheduler UI > (http://:/cluster/scheduler) to check resource quotas > and usage of the queue. > Hope it helps. > > Best, > Tao Yang > >> 在 2019年7月10日,上午8:23,Jason Laughman 写道: >> >> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating >> through HDFS, but when I try to run a job via Hive (I see that it’s >> deprecated, but it’s what I’m working with for now) it never gets out of >> accepted state in the web tool. I’ve done some Googling and the general >> consensus is that it’s resource constraints, so can someone tell me if I’ve >> got enough horsepower here? >> >> I’ve got one small name server, three small data servers, and two larger >> data servers. I figured out the the small data servers were too small >> because even if I tried to tweak YARN parameters for RAM and CPU the >> resource managers would immediately shutdown. I added the two larger data >> servers, and now I see two active nodes but only with a total of one >> container: >> >> $ yarn node -list >> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at >> :8032 >> Total Nodes:2 >>Node-IdNode-State Node-Http-Address >> Number-of-Running-Containers >> node1:40079 RUNNING node1:8042 1 >> node2:36311 RUNNING node2:8042 0 >> >> There are a ton of some sort of automated jobs backed up on there, and when >> I try to run anything through Hive it just sits there and eventually times >> out (I do see it get accepted). My larger nodes are 4 GB RAM and 2 vcores >> and I set YARN to do automated resource allocation with >> yarn.nodemanager.resource.detect-hardware-capabilities. Is that enough to >> even get a POC lab working? I don’t care about having the three smaller >> servers running as resource nodes, but I’d like to have a better >> understanding of what’s going on with the larger servers, because it seems >> like they’re close to working. >> >> Here’s the metrics data from the website, hopefully somebody can parse it. >> Cluster Metrics >> Apps Submitted Apps PendingApps RunningApps Completed >> Containers Running Memory Used Memory TotalMemory Reserved >> VCores Used VCores TotalVCores Reserved >> 292 284 1 7 1 1 GB3.38 GB 0 B 1 4 >> 0 >> Cluster Nodes Metrics >> Active Nodes Decommissioning Nodes Decommissioned NodesLost Nodes >> Unhealthy Nodes Rebooted Nodes Shutdown Nodes >> 20 0 0 0 0 4 >> Scheduler Metrics >> Scheduler Type Scheduling
Re: new to hadoop, jobs never leaving accepted
Hi, Jason. According to the information you provided, your cluster has two nodes with the same resource , the single running container is AM container which already takes over . I think a possible cause is that available resource of your cluster was insufficient for requesting new containers, please refer to the application attempt UI (http://:/cluster/appattempt/), you can find outstanding requests with required resources over there. Another possible cause is the queue/user limit, you can refer to scheduler UI (http://:/cluster/scheduler) to check resource quotas and usage of the queue. Hope it helps. Best, Tao Yang > 在 2019年7月10日,上午8:23,Jason Laughman 写道: > > I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating through > HDFS, but when I try to run a job via Hive (I see that it’s deprecated, but > it’s what I’m working with for now) it never gets out of accepted state in > the web tool. I’ve done some Googling and the general consensus is that it’s > resource constraints, so can someone tell me if I’ve got enough horsepower > here? > > I’ve got one small name server, three small data servers, and two larger data > servers. I figured out the the small data servers were too small because > even if I tried to tweak YARN parameters for RAM and CPU the resource > managers would immediately shutdown. I added the two larger data servers, > and now I see two active nodes but only with a total of one container: > > $ yarn node -list > 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at > :8032 > Total Nodes:2 > Node-IdNode-State Node-Http-Address > Number-of-Running-Containers > node1:40079 RUNNING node1:8042 1 > node2:36311 RUNNING node2:8042 0 > > There are a ton of some sort of automated jobs backed up on there, and when I > try to run anything through Hive it just sits there and eventually times out > (I do see it get accepted). My larger nodes are 4 GB RAM and 2 vcores and I > set YARN to do automated resource allocation with > yarn.nodemanager.resource.detect-hardware-capabilities. Is that enough to > even get a POC lab working? I don’t care about having the three smaller > servers running as resource nodes, but I’d like to have a better > understanding of what’s going on with the larger servers, because it seems > like they’re close to working. > > Here’s the metrics data from the website, hopefully somebody can parse it. > Cluster Metrics > Apps SubmittedApps PendingApps RunningApps Completed > Containers Running Memory Used Memory TotalMemory Reserved > VCores Used VCores TotalVCores Reserved > 292 284 1 7 1 1 GB3.38 GB 0 B 1 4 > 0 > Cluster Nodes Metrics > Active Nodes Decommissioning Nodes Decommissioned NodesLost Nodes > Unhealthy Nodes Rebooted Nodes Shutdown Nodes > 2 0 0 0 0 0 4 > Scheduler Metrics > Scheduler TypeScheduling Resource TypeMinimum Allocation > Maximum Allocation Maximum Cluster Application Priority > Capacity Scheduler[MEMORY] vCores:2> 0 > - > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org smime.p7s Description: S/MIME cryptographic signature