Re: Bulk chmod,chown operations on HDFS

2016-06-15 Thread Chris Nauroth
Hello Ravi,

You might consider using DistCh.  In the same way that DistCp is a distributed 
copy implemented as a MapReduce job, DistCh is a MapReduce job that distributes 
the work of chmod/chown.

DistCh will become easier to access through convenient shell commands in Apache 
Hadoop 3.  In version 2.6.0, it's undocumented and hard to find, but it's still 
there.  It's inside the hadoop-extras.jar.  Here is an example invocation:

hadoop jar share/hadoop/tools/lib/hadoop-extras-*.jar 
org.apache.hadoop.tools.DistCh

It might take some fiddling with the classpath to get this right.  If so, then 
I recommend looking at how the shell scripts in trunk set up the classpath.

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-extras/src/main/shellprofile.d/hadoop-extras.sh

As you pointed out, this would generate higher NameNode traffic compared to 
your typical baseline load.  To mitigate this, I recommend that you start with 
a test run in a non-production environment to see how it reacts.

--Chris Nauroth

From: ravi teja mailto:raviort...@gmail.com>>
Date: Wednesday, June 15, 2016 at 8:33 PM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: Bulk chmod,chown operations on HDFS

Hi Community,

As part of the new authorisation changes, we need to change the permissions and 
owners of many files in hdfs (2.6.0) with chmod and chown.

To do this we need to stop the processing on the directories to avoid 
inconsistencies in permissions, hence we need to take a downtime for those 
specific pipelines operating on these folders.


The total number of files/directories to be operated upon is around 10 Million.
A chmod recursive (chmod -R) on 160K objects, has taken around 15 minutes.

At this rate it will take a long time to complete the operation and the 
downtime would be couple of hours.

Mapreduce program  is one option, but chmod,chown being a heavy operations, 
will slow down the cluster for other users, if done at this scale.

Are there any options to do a bulk permissions changes chmod,chown to avoid 
these issues?
If not are there any alternative approaches to carry the same operation at this 
scale something like admin backdoor to fsimage?



Thanks,
Ravi Teja


Bulk chmod,chown operations on HDFS

2016-06-15 Thread ravi teja
Hi Community,

As part of the new authorisation changes, we need to change the permissions
and owners of many files in hdfs (2.6.0) with chmod and chown.

To do this we need to stop the processing on the directories to avoid
inconsistencies in permissions, hence we need to take a downtime for those
specific pipelines operating on these folders.


The total number of files/directories to be operated upon is around 10
Million.
A chmod recursive (chmod -R) on 160K objects, has taken around 15 minutes.

At this rate it will take a long time to complete the operation and the
downtime would be couple of hours.

Mapreduce program  is one option, but chmod,chown being a heavy operations,
will slow down the cluster for other users, if done at this scale.

Are there any options to do a bulk permissions changes chmod,chown to avoid
these issues?
If not are there any alternative approaches to carry the same operation at
this scale something like admin backdoor to fsimage?



Thanks,
Ravi Teja


Multi node maintenance for HDFS?

2016-06-15 Thread Stephan Hoermann
Hi,

How do people do multi node maintenance for HDFS without data loss?

We want to apply the ideas of immutable infrastructure to how we manage our
machines. We prebuild an OS image with the configuration and roll it out to
our nodes. When we have a patch we build a new image and roll that out
again. It takes us about 10 to 15 minutes to do that.

For our data nodes we want to keep the data on a separate partition/disks
so that when we rebuild we rejoin HDFS with the data don't start a
replication storm.

Now in order to scale this and quickly roll out upgrades we can't really do
a one node at a time upgrade so we need to be able to take out a percentage
of the nodes at a time. Ideally we would like to do this while keeping the
replication count of each block at 2 (so we can still handle failure while
we are doing an upgrade) and without starting a replication strategy.

Right now it doesn't look like that is really supported. Is anyone else
doing multi node upgrades and how do you solve these problems?

We are considering changing the replication strategy so that we divide all
our nodes into 3 evenly sized buckets and at maintenance remove a subset
from one bucket at a time. Does anyone have experience with doing something
similar?

Regards,

Stephan


Re: HDFS backup to S3

2016-06-15 Thread Anu Engineer
Sorry my bad, http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html 
  the closing bracket was attached to the URL.

Thanks
Anu


From: max scalf 
Date: Wednesday, June 15, 2016 at 2:48 PM
To: Anu Engineer , HDP mailing list 

Subject: Re: HDFS backup to S3

Hi Anu,

Thank for the information, the link you provided does not work.

@Hari,

Let me do some quick research on what you guys can provide and get back to you.
On Wed, Jun 15, 2016, 10:59 AM Anu Engineer 
mailto:aengin...@hortonworks.com>> wrote:
Hi Max,

Unfortunately, we don’t have a better solution at the moment. I am wondering if 
the right approach might be to use user-defined metadata 
(http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html) and put 
that information along with the object that we are backing up.

However, that would be a code change in DistCp, and not as easy as a script. 
But that would address the scalability issue that you are worried about.

Thanks
Anu



From: max scalf mailto:oracle.bl...@gmail.com>>
Date: Wednesday, June 15, 2016 at 7:15 AM
To: HDP mailing list mailto:user@hadoop.apache.org>>
Subject: HDFS backup to S3

Hello Hadoop community,

we are running hadoop in AWS(not EMR) but hortonworks distro on EC2 instance.  
Everything is all setup and working as expected.  Our design calls for running 
HDFS/data nodes on local/ephemeral storage and we have 3X replication enabled 
by default, all of the metastore (hive, oozie, ranger, ambari etc etc ..) are 
external to the cluster using RDS/mysql.

The question that I have is with regards to backups.  We want to run a night 
job that copies data from HDFS into S3.  Knowing that we our cluster lives in 
AWS, the obvious choice is to run our backup to S3.  We do not want a warm 
backup(backup this cluster to another cluster), our RTO/RPO is 5 days for this 
cluster.  So we can run distcp (something like below link) to backup our hdfs 
to S3 and we have tested this and works just fine, but how do we go about 
storage the ownership/permission on these files.

http://www.nixguys.com/blog/backup-hadoop-hdfs-amazon-s3-shell-script

As S3 is a blob storage and does not store any ownership/permission, how do we 
go about backing that up?  One of the ideas I had was to run hdfs dfs -lsr (and 
recursively get all files and folders permissions/ownership) and dump that into 
a file and send that file over to S3 as well, but I am guessing it will work 
now but as the cluster grows it might not scale...

So I wanted to find out how are people managed backing up ownership/permission 
of HDFS file/folder when sending back up to a blob storage like S3.




Re: HDFS backup to S3

2016-06-15 Thread max scalf
Hi Anu,

Thank for the information, the link you provided does not work.

@Hari,

Let me do some quick research on what you guys can provide and get back to
you.

On Wed, Jun 15, 2016, 10:59 AM Anu Engineer 
wrote:

> Hi Max,
>
>
>
> Unfortunately, we don’t have a better solution at the moment. I am
> wondering if the right approach might be to use user-defined metadata (
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html) and
> put that information along with the object that we are backing up.
>
>
>
> However, that would be a code change in DistCp, and not as easy as a
> script. But that would address the scalability issue that you are worried
> about.
>
>
>
> Thanks
>
> Anu
>
>
>
>
>
>
>
> *From: *max scalf 
> *Date: *Wednesday, June 15, 2016 at 7:15 AM
> *To: *HDP mailing list 
> *Subject: *HDFS backup to S3
>
>
>
> Hello Hadoop community,
>
>
>
> we are running hadoop in AWS(not EMR) but hortonworks distro on EC2
> instance.  Everything is all setup and working as expected.  Our design
> calls for running HDFS/data nodes on local/ephemeral storage and we have 3X
> replication enabled by default, all of the metastore (hive, oozie, ranger,
> ambari etc etc ..) are external to the cluster using RDS/mysql.
>
>
>
> The question that I have is with regards to backups.  We want to run a
> night job that copies data from HDFS into S3.  Knowing that we our cluster
> lives in AWS, the obvious choice is to run our backup to S3.  We do not
> want a warm backup(backup this cluster to another cluster), our RTO/RPO is
> 5 days for this cluster.  So we can run distcp (something like below link)
> to backup our hdfs to S3 and we have tested this and works just fine, but
> how do we go about storage the ownership/permission on these files.
>
>
>
> http://www.nixguys.com/blog/backup-hadoop-hdfs-amazon-s3-shell-script
>
>
>
> As S3 is a blob storage and does not store any ownership/permission, how
> do we go about backing that up?  One of the ideas I had was to run hdfs dfs
> -lsr (and recursively get all files and folders permissions/ownership) and
> dump that into a file and send that file over to S3 as well, but I am
> guessing it will work now but as the cluster grows it might not scale...
>
>
>
> So I wanted to find out how are people managed backing up
> ownership/permission of HDFS file/folder when sending back up to a blob
> storage like S3.
>
>
>
>
>


Re: Verifying the authenticity of submitted AM

2016-06-15 Thread Mingyu Kim
Sorry for the late response. I finally caught up on most chapters on the 
gitbook you linked. This was super helpful. Thanks for the pointer.

 

Just to make sure I understood it correctly,

 

1.   One can send a secret as a command-line argument or environment 
variable to the AM securely by setting up Kerberos and setting 
hadoop.rpc.protection=privacy, because then the application submission request 
blob will be sent to the node manager encrypted.

2.   A client outside YARN can make a REST call to AM and verify the 
identity of AM (assuming the REST server is set up to use Kerberos) via SPNEGO.

3.   A REST server outside YARN can verify the identity of AM when AM makes 
a callback via SPNEGO. However, the authenticity can be verified at the user 
identity level. For example, if two applications are submitted under user A, 
one application can pretend to be the other application because they are 
authenticated as a same user.

 

So, it sounds like if I’d like to verify the authenticity of a particular AM 
submitted as opposed to relying on user-identity level authenticity check 
provided via SPNEGO, using the option #1 to securely pass a one-time secret 
would be the right way to go. Please correct me if any of my understanding is 
wrong.

 

Thanks,

Mingyu

 

From: Sunil Govind 
Date: Friday, June 10, 2016 at 6:07 AM
To: Mingyu Kim , Rohith Sharma K S 
, "user@hadoop.apache.org" 
Cc: Matt Cheah 
Subject: Re: Verifying the authenticity of submitted AM

 

HI Mingyu, 

 

May be you can take a look at below link

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html

 

It will give a fair idea about the security you can get for an application

 

- Sunil

 

On Fri, Jun 10, 2016 at 3:54 AM Mingyu Kim  wrote:

// forking for clarify

 

Related to the question I had below, I’m wondering how I can verify the 
authenticity of the submitted AM. (For example, when I’m making a call to AM, 
I’d like to verify that I’m talking to the AM that I submitted, not someone 
else who hijacked my network traffic. Also, when AM makes a callback to a 
server outside YARN, I’d like to verify that it’s the AM I submitted, not 
someone else who’s spoofing) This can generally be achieved by sending a secret 
(whether that’s a one-time secret that the server outside YARN can verity or a 
SSL keystore) to AM. Do you know how one can securely send the secret to AM? 
Or, is there an existing YARN mechanism I can rely on to verify the 
authenticity? (I saw ApplicationReport.getClientToAMToken(), but that seems to 
be for AM to verify the authenticity of client) Again, any pointer will be 
appreciated.

 

Thanks,

Mingyu

 

From: Rohith Sharma K S 
Date: Wednesday, June 8, 2016 at 11:15 PM
To: Mingyu Kim , "user@hadoop.apache.org" 

Cc: Matt Cheah 
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi

 

Do you know how I can extend the client interface of the RPC port?

>>> YARN provides YARNClIent library that uses ApplicationClientProtocol. For 
>>> your more understanding refer 
>>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client

 

I know AM has some endpoints exposed through the RPC port for internal YARN 
communications, but was not sure how I can extend it to expose a custom 
endpoint.

>>> I am not sure what you mean here internal YARN communication? AM can 
>>> connect to RM only via AM-RM interface for register/unregister and 
>>> heartbeat and details sent to RM are limited.  It is up to the AM’s to 
>>> expose client interface for providing metadata.

Thanks & Regards

Rohith Sharma K S

From: Mingyu Kim [mailto:m...@palantir.com] 
Sent: 09 June 2016 11:21
To: Rohith Sharma K S; user@hadoop.apache.org
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi Rohith,

 

Thanks for the quick response. That sounds promising. Do you know how I can 
extend the client interface of the RPC port? I know AM has some endpoints 
exposed through the RPC port for internal YARN communications, but was not sure 
how I can extend it to expose a custom endpoint. Any pointer would be 
appreciated!

 

Mingyu

 

From: Rohith Sharma K S 
Date: Wednesday, June 8, 2016 at 10:39 PM
To: Mingyu Kim , "user@hadoop.apache.org" 

Cc: Matt Cheah 
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

 

Hi

 

Apart from AM address and tracking URL, no other meta data of applicationMaster 
are stored in YARN. May be AM can expose client interface so that AM clients 
can interact with Running AM to retrieve specific AM details. 

 

RPC port of AM can be get from YARN client interface such as 
ApplicationClientProtocol# getApplicationReport() OR ApplicationClientProtocol 
#getApplicationAttemptReport().

Re: HDFS backup to S3

2016-06-15 Thread Anu Engineer
Hi Max,

Unfortunately, we don’t have a better solution at the moment. I am wondering if 
the right approach might be to use user-defined metadata 
(http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html) and put 
that information along with the object that we are backing up.

However, that would be a code change in DistCp, and not as easy as a script. 
But that would address the scalability issue that you are worried about.

Thanks
Anu



From: max scalf 
Date: Wednesday, June 15, 2016 at 7:15 AM
To: HDP mailing list 
Subject: HDFS backup to S3

Hello Hadoop community,

we are running hadoop in AWS(not EMR) but hortonworks distro on EC2 instance.  
Everything is all setup and working as expected.  Our design calls for running 
HDFS/data nodes on local/ephemeral storage and we have 3X replication enabled 
by default, all of the metastore (hive, oozie, ranger, ambari etc etc ..) are 
external to the cluster using RDS/mysql.

The question that I have is with regards to backups.  We want to run a night 
job that copies data from HDFS into S3.  Knowing that we our cluster lives in 
AWS, the obvious choice is to run our backup to S3.  We do not want a warm 
backup(backup this cluster to another cluster), our RTO/RPO is 5 days for this 
cluster.  So we can run distcp (something like below link) to backup our hdfs 
to S3 and we have tested this and works just fine, but how do we go about 
storage the ownership/permission on these files.

http://www.nixguys.com/blog/backup-hadoop-hdfs-amazon-s3-shell-script

As S3 is a blob storage and does not store any ownership/permission, how do we 
go about backing that up?  One of the ideas I had was to run hdfs dfs -lsr (and 
recursively get all files and folders permissions/ownership) and dump that into 
a file and send that file over to S3 as well, but I am guessing it will work 
now but as the cluster grows it might not scale...

So I wanted to find out how are people managed backing up ownership/permission 
of HDFS file/folder when sending back up to a blob storage like S3.




HDFS backup to S3

2016-06-15 Thread max scalf
Hello Hadoop community,

we are running hadoop in AWS(not EMR) but hortonworks distro on EC2
instance.  Everything is all setup and working as expected.  Our design
calls for running HDFS/data nodes on local/ephemeral storage and we have 3X
replication enabled by default, all of the metastore (hive, oozie, ranger,
ambari etc etc ..) are external to the cluster using RDS/mysql.

The question that I have is with regards to backups.  We want to run a
night job that copies data from HDFS into S3.  Knowing that we our cluster
lives in AWS, the obvious choice is to run our backup to S3.  We do not
want a warm backup(backup this cluster to another cluster), our RTO/RPO is
5 days for this cluster.  So we can run distcp (something like below link)
to backup our hdfs to S3 and we have tested this and works just fine, but
how do we go about storage the ownership/permission on these files.

http://www.nixguys.com/blog/backup-hadoop-hdfs-amazon-s3-shell-script

As S3 is a blob storage and does not store any ownership/permission, how do
we go about backing that up?  One of the ideas I had was to run hdfs dfs
-lsr (and recursively get all files and folders permissions/ownership) and
dump that into a file and send that file over to S3 as well, but I am
guessing it will work now but as the cluster grows it might not scale...

So I wanted to find out how are people managed backing up
ownership/permission of HDFS file/folder when sending back up to a blob
storage like S3.


RE: maximum-am-resource-percent is insufficient to start a single application

2016-06-15 Thread Varun saxena
Hi Philip,

The cluster metrics in the screenshot attached shows that there are no active 
nodes.
Have you started any Node Manager process ?

-Varun Saxena.

From: Phillip Wu [mailto:phillip...@unsw.edu.au]
Sent: 15 June 2016 15:28
To: user@hadoop.apache.org
Cc: Sunil Govind; Varun saxena
Subject: RE: maximum-am-resource-percent is insufficient to start a single 
application

Thanks for your help.

I’ve run the http and this is it:
Apps Submitted   Apps PendingApps RunningApps Completed 
 Containers RunningMemory UsedMemory Total   
Memory Reserved  VCores Used  VCores Total  VCores Reserved 
Active NodesDecommissioned Nodes   Lost 
Nodes Unhealthy NodesRebooted Nodes
1  1  0  0  0  0 B  
0 B  0 B  0  0  0  
0  0  0  0  0
ID User Name Application Type Queue StartTime FinishTime State FinalStatus 
Progress Tracking UI Blacklisted Nodes
application_1465974687991_0001 hduser  Insert into CATEGORIES 
Va...'beverages.gif')(Stage-1)MAPREDUCEdefault Wed, 15 Jun 
2016 07:13:31 GMTN/AACCEPTED   UNDEFINED
ApplicationMaster   0

I’ve attached a jpeg of the http & resource manager log.

The resource log looks like:
2016-06-15 07:13:30,152 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
applicationId: 1
2016-06-15 07:13:31,906 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing 
application with id application_1465974687991_0001
2016-06-15 07:13:31,912 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1465974687991_0001 State change from NEW to NEW_SAVING
2016-06-15 07:13:31,917 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with 
id 1 submitted by user hduser
2016-06-15 07:13:31,923 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing 
info for app: application_1465974687991_0001
2016-06-15 07:13:31,925 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hduser   
IP=127.0.0.1OPERATION=Submit Application RequestTARGET=ClientRMService  
RESULT=SUCCESS  APPID=application_1465974687991_0001
2016-06-15 07:13:31,928 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1465974687991_0001 State change from NEW_SAVING to SUBMITTED
2016-06-15 07:13:31,930 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application added - appId: application_1465974687991_0001 user: hduser 
leaf-queue of parent: root #applications: 1
2016-06-15 07:13:31,931 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Accepted application application_1465974687991_0001 from user: hduser, in 
queue: default
2016-06-15 07:13:31,947 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1465974687991_0001 State change from SUBMITTED to ACCEPTED
2016-06-15 07:13:32,001 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1465974687991_0001_01
2016-06-15 07:13:32,002 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1465974687991_0001_01 State change from NEW to SUBMITTED
2016-06-15 07:13:32,034 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
maximum-am-resource-percent is insufficient to start a single application in 
queue, it is likely set too low. skipping enforcement to allow at least one 
application to start
2016-06-15 07:13:32,034 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
maximum-am-resource-percent is insufficient to start a single application in 
queue for user, it is likely set too low. skipping enforcement to allow at 
least one application to start
2016-06-15 07:13:32,035 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Application application_1465974687991_0001 from user: hduser activated in 
queue: default
2016-06-15 07:13:32,035 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Application added - appId: application_1465974687991_0001 user: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@6593b169,
 leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 
#queue-pending-applications: 0 #queue-active-applications: 1
2016-06-15 07:13:32,035 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Added Application Attempt appattemp

Re: maximum-am-resource-percent is insufficient to start a single application

2016-06-15 Thread Sunil Govind
Adding to what Varun has said, Resource Manager log will be of help here to
confirm same.

The code snippet which you have mentioned is correct. But it also has a
check that if the number of active application is less than 1, this check
wont be performed. And it seems you have only one application.

- Sunil



On Wed, Jun 15, 2016 at 12:27 PM Varun saxena 
wrote:

> Can you open the Resource Manager(RM) UI and share screenshot of main RM
> page. We can check cluster resources there. Most probably cluster does not
> have enough resources.
>
> How much memory and VCores does your AM need ?
>
> RM UI can be accessed at http://localhost:8088/
>
>
>
> - Varun Saxena.
>
>
>
> *From:* Phillip Wu [mailto:phillip...@unsw.edu.au]
> *Sent:* 15 June 2016 14:42
> *To:* user@hadoop.apache.org
> *Cc:* Sunil Govind
> *Subject:* RE: maximum-am-resource-percent is insufficient to start a
> single application
>
>
>
> Sunil,
>
>
>
> Thanks for your email.
>
>
>
> 1.   I don’t think anything on the cluster is being used – see below
>
> I’m not sure how to get my “total cluster resource size” – please advise
> how to get this?
>
> After doing the hive insert I get this:
>
> hduser@ip-10-118-112-182:/$ hadoop queue -info default -showJobs
>
> 16/06/10 02:24:49 INFO client.RMProxy: Connecting to ResourceManager at /
> 127.0.0.1:8050
>
> ==
>
> Queue Name : default
>
> Queue State : running
>
> Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0,
> CurrentCapacity: 0.0
>
> Total jobs:1
>
>   JobId  State   StartTime
> UserName   Queue  Priority   UsedContainers
> RsvdContainers  UsedMem RsvdMem NeededMem AM info
>
> job_1465523894946_0001   PREP   1465524072194
>  hduser defaultNORMAL0
> 0   0M  0M0M
> http://localhost:8088/proxy/application_1465523894946_0001/
>
>
>
> hduser@ip-10-118-112-182:/$ mapred job -status  job_1465523894946_0001
>
> Job: job_1465523894946_0001
>
> Job File:
> /tmp/hadoop-yarn/staging/hduser/.staging/job_1465523894946_0001/job.xml
>
> Job Tracking URL :
> http://localhost:8088/proxy/application_1465523894946_0001/
>
> Uber job : false
>
> Number of maps: 0
>
> Number of reduces: 0
>
> map() completion: 0.0
>
> reduce() completion: 0.0
>
> Job state: PREP
>
> retired: false
>
> reason for failure:
>
> Counters: 0
>
> 2.   There are no other applications except I’m running zookeeper
>
> 3.   There is only one user
>
>
>
> For your assistance this seems to be the code generating the error
> message[…yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java]:
>
> if (!Resources.lessThanOrEqual(
>
>   resourceCalculator, lastClusterResource, userAmIfStarted,
>
>   userAMLimit)) {
>
> if (getNumActiveApplications() < 1) {
>
>   LOG.warn("maximum-am-resource-percent is insufficient to start
> a" +
>
> " single application in queue for user, it is likely set too
> low." +
>
> " skipping enforcement to allow at least one application to
> start");
>
> } else {
>
>   LOG.info("not starting application as amIfStarted exceeds " +
>
> "userAmLimit");
>
>   continue;
>
> }
>
>   }
>
>
>
> Any ideas?
>
>
>
> Phillip
>
> *From:* Sunil Govind [mailto:sunil.gov...@gmail.com
> ]
> *Sent:* Wednesday, 15 June 2016 4:24 PM
> *To:* Phillip Wu; user@hadoop.apache.org
> *Subject:* Re: maximum-am-resource-percent is insufficient to start a
> single application
>
>
>
> Hi Philip
>
>
>
> Higher maximum-am-resource-percent value (0~1) will help to allocate more
> resource for your ApplicationMaster container of a yarn application (MR
> Jobs here), but also depend on the capacity configured for the queue. You
> have mentioned that there is only default queue here, so that wont be a
> problem. Few questions:
>
> - How much is your total cluster resource size and how much of cluster
> resource is used now ?
>
> - Is there any other application were running in cluster and whether
> it was taking full cluster resource.? This is a possibility since you now
> gave whole queue's capacity for AM containers.
>
> - Do you have multiple users in your cluster who runs applications
> other that this hive job? If so,
> yarn.scheduler.capacity..minimum-user-limit-percent will have
> impact on AM resource usage limit. I think you can double check this.
>
>
>
>
>
> - Sunil
>
>
>
> On Wed, Jun 15, 2016 at 8:47 AM Phillip Wu  wrote:
>
> Hi,
>
>
>
> I'm new to Hadoop and Hive.
>
>
>
> I'm using Hadoop 2.6.4 (binary I got from internet) & Hive 2.0.1 (binary I
> got from internet).
>
> I can create a database and table in hive.
>
>
>
> However when I try to insert a record into a previously created table I
> get:
>
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> maximum-am-resource-percent is insufficient to start a single appli