Re: Having trouble adding external JAR to MapReduce Program

2014-02-21 Thread Jonathan Poon
Thanks for the suggestions!  I will give them a look and try to see it it
will work!

Jonathan


On Fri, Feb 21, 2014 at 6:35 PM, Azuryy Yu  wrote:

> Hi,
>
> you cannot add jar like this way.
>
> please look at DistributeCache in the Hadoop Java Doc.
>
> please call DistributeCache.addArchive() in your main Class before submit
> the MR job.
>
>
> On Sat, Feb 22, 2014 at 9:30 AM, Gaurav Gupta wrote:
>
>> Jonathan,
>>
>>
>>
>> You have to make sure that the jar is available on the nodes where the
>> map reduce job is running. Setting the HADOOP_CLASSPATH on the single node
>> doesn't work.
>>
>> You can use -libjars to the hadoop command line.
>>
>>
>>
>> Thanks
>>
>> Gaurav
>>
>>
>>
>> *From:* Jonathan Poon [mailto:jkp...@ucdavis.edu]
>> *Sent:* Friday, February 21, 2014 5:12 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Having trouble adding external JAR to MapReduce Program
>>
>>
>>
>> Hi Everyone,
>>
>> I'm running into trouble adding the Avro JAR into my MapReduce program.
>> I do the following to try to add the Avro JAR:
>>
>> export
>> HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar"
>>
>> export
>> LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar"
>>
>> hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS}
>> /user/jonathanpoon/avro /user/jonathanpoon/output
>>
>> However, I get the following error:
>>
>> 14/02/21 17:01:17 INFO mapred.JobClient: Task Id :
>> attempt_201402191318_0014_m_01_2, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> org.apache.avro.mapreduce.AvroKeyInputFormat
>> at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
>> at
>> org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.avro.mapreduce.AvroKeyInputFormat
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:270)
>> at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
>> at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
>> ... 8 more
>>
>>
>> Am I placing the Avro JAR files in the improper place?
>>
>> Thanks for your help!
>>
>> Jonathan
>>
>>
>>
>
>


Re: Having trouble adding external JAR to MapReduce Program

2014-02-21 Thread Azuryy Yu
Hi,

you cannot add jar like this way.

please look at DistributeCache in the Hadoop Java Doc.

please call DistributeCache.addArchive() in your main Class before submit
the MR job.


On Sat, Feb 22, 2014 at 9:30 AM, Gaurav Gupta wrote:

> Jonathan,
>
>
>
> You have to make sure that the jar is available on the nodes where the map
> reduce job is running. Setting the HADOOP_CLASSPATH on the single node
> doesn't work.
>
> You can use -libjars to the hadoop command line.
>
>
>
> Thanks
>
> Gaurav
>
>
>
> *From:* Jonathan Poon [mailto:jkp...@ucdavis.edu]
> *Sent:* Friday, February 21, 2014 5:12 PM
> *To:* user@hadoop.apache.org
> *Subject:* Having trouble adding external JAR to MapReduce Program
>
>
>
> Hi Everyone,
>
> I'm running into trouble adding the Avro JAR into my MapReduce program.  I
> do the following to try to add the Avro JAR:
>
> export
> HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar"
>
> export
> LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar"
>
> hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS}
> /user/jonathanpoon/avro /user/jonathanpoon/output
>
> However, I get the following error:
>
> 14/02/21 17:01:17 INFO mapred.JobClient: Task Id :
> attempt_201402191318_0014_m_01_2, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.avro.mapreduce.AvroKeyInputFormat
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
> at
> org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.avro.mapreduce.AvroKeyInputFormat
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
> ... 8 more
>
>
> Am I placing the Avro JAR files in the improper place?
>
> Thanks for your help!
>
> Jonathan
>
>
>


RE: Having trouble adding external JAR to MapReduce Program

2014-02-21 Thread Gaurav Gupta
Jonathan,

 

You have to make sure that the jar is available on the nodes where the map
reduce job is running. Setting the HADOOP_CLASSPATH on the single node
doesn't work.

You can use -libjars to the hadoop command line.

 

Thanks

Gaurav

 

From: Jonathan Poon [mailto:jkp...@ucdavis.edu] 
Sent: Friday, February 21, 2014 5:12 PM
To: user@hadoop.apache.org
Subject: Having trouble adding external JAR to MapReduce Program

 

Hi Everyone,

I'm running into trouble adding the Avro JAR into my MapReduce program.  I
do the following to try to add the Avro JAR:

export
HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avr
o-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro
-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar"

export
LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/a
vro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/av
ro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar"

hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS}
/user/jonathanpoon/avro /user/jonathanpoon/output

However, I get the following error:

14/02/21 17:01:17 INFO mapred.JobClient: Task Id :
attempt_201402191318_0014_m_01_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.avro.mapreduce.AvroKeyInputFormat
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at
org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:1
87)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException:
org.apache.avro.mapreduce.AvroKeyInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 8 more




Am I placing the Avro JAR files in the improper place?  

Thanks for your help!

Jonathan

 



Having trouble adding external JAR to MapReduce Program

2014-02-21 Thread Jonathan Poon
Hi Everyone,

I'm running into trouble adding the Avro JAR into my MapReduce program.  I
do the following to try to add the Avro JAR:

export
HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar"

export
LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar"

hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS}
/user/jonathanpoon/avro /user/jonathanpoon/output

However, I get the following error:

14/02/21 17:01:17 INFO mapred.JobClient: Task Id :
attempt_201402191318_0014_m_01_2, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.avro.mapreduce.AvroKeyInputFormat
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at
org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException:
org.apache.avro.mapreduce.AvroKeyInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 8 more



Am I placing the Avro JAR files in the improper place?

Thanks for your help!

Jonathan


Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin

2014-02-21 Thread Chris Schneider
Hi John,

FWIW, setting the log level of org.apache.hadoop.security.UserGroupInformation 
to ERROR seemed to prevent the fatal NameNode slowdown we ran into. Although I 
still saw "no such user" Shell$ExitCodeException messages in the logs, these 
only occurred every few minutes or so. Thus, it seems like this is a reasonable 
work around until the underlying problem is fixed. I suggest that you file a 
JIRA ticket, though, as nobody seems to be rushing in here to tell us what 
we're doing wrong.

Thanks,

- Chris

On Feb 18, 2014, at 5:54 PM, Chris Schneider wrote:

> Hi John,
> 
> My AWS Elastic MapReduce NameNode is also filling its log file with messages 
> like the following:
> 
> 2014-02-18 23:56:52,344 WARN org.apache.hadoop.security.UserGroupInformation 
> (IPC Server handler 78 on 9000): No groups available for user 
> job_201402182309_0073
> 2014-02-18 23:56:52,351 WARN org.apache.hadoop.security.UserGroupInformation 
> (IPC Server handler 48 on 9000): No groups available for user 
> job_201402182309_0073
> 2014-02-18 23:56:52,356 WARN org.apache.hadoop.security.UserGroupInformation 
> (IPC Server handler 38 on 9000): No groups available for user 
> job_201402182309_0073
> 
> I ran into this same issue in March 2013 and got past it by using an 
> m1.xlarge master node (instead of my usual m1.large) when (like right now) I 
> double my slave count (to 32 cc2.8xlarge instances) to re-import a lot of my 
> input data. Using that m1.xlarge didn't prevent the NameNode from logging 
> messages like this, but the beefier instance seemed to weather the load these 
> messages represented better.
> 
> Unfortunately, even my m1.xlarge master node now seems overwhelmed. The 
> cluster starts off fine, efficiently mowing through the jobs in my job flow 
> step for a few hours, but it eventually gets into a mode where the copy phase 
> of the reduce jobs appear to make no progress at all. At that point, the 
> NameNode seems to be spending all of its time writing messages like the ones 
> above.
> 
> The issue doesn't seem to be related to the NameNode JVM size (I tried 
> increasing it to 4GB before I realized it never used more than ~400MB), nor 
> dfs.namenode.handler.count (which I increased from 64 to 96).
> 
> We're currently trying to work around the problem by hacking log4j.properties 
> to set the logging level for org.apache.hadoop.security.UserGroupInformation 
> to ERROR. We might have to do so for the entire package, as I've also seen 
> the following in the NameNode logs:
> 
> 2014-02-19 01:01:24,184 WARN 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping (IPC Server handler 84 
> on 9000): got exception trying to get groups for user job_201402182309_0226
> org.apache.hadoop.util.Shell$ExitCodeException: id: job_201402182309_0226: No 
> such user
> 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
>   at org.apache.hadoop.util.Shell.run(Shell.java:182)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:78)
>   at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:53)
>   at org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1037)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:50)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5218)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:5201)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2030)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:850)
>   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> 
> I would also be very interested in hearing Jakob Homan and Deveraj Das 
> respond to your analysis of the changes made for MAPREDUCE-1457.
> 
> Please post again wi

Re: Questions from a newbie to Hadoop

2014-02-21 Thread Publius
I wish to run pseudo machine on a virtual box

I have it almost running on oracle virtual box with Hortonworks sandbox 2.0, 
but the virtual appliance is wanting a 64 bit CPU, and mine is only 32 bit

still looking for a 32 bit version of Hortonworks sand box 2.0

Hortonworks seems to be very stable and good so far, very easy to set up

what does this mean;  set up P-D mode on it?

:O

thanks for the reply

From: Devin Suiter RDX 
>To: user@hadoop.apache.org 
>Sent: Friday, February 21, 2014 11:30 AM
>Subject: Re: Questions from a newbie to Hadoop
> 
>
>
>You should also clarify for the group:
>
>
>Do you want to make a virtual machine to run a pseudo-distributed Hadoop 
>cluster on?
>
>
>Or
>
>
>Do you want to install Hadoop directly onto the Vista machine and run it there?
>
>
>If the former, you should be able to set up a VM just fine with a Linux 
>version of your choice - we use CentOS for that sort of thing here - and set 
>up P-D mode on it, just by using VirtualBox.
>
>
>If the latter...
>
>
>You make want to invest in some extra RAM - no, I'm kidding. :-) HortonWorks 
>has the stable Windows builds as Arpit has already directed you to.
>
>
>You could also consider the Amazon and Microsoft Azure cloud versions of 
>Hadoop, if you can afford to pay for a few cycles every month. They're pretty 
>affordable, and those Amazon $100 gift cards for EC2 are popular giveaway 
>items at trade shows and whatnot...
>
>
>Devin Suiter
>Jr. Data Solutions Software Engineer
>100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>Google Voice: 412-256-8556 | www.rdx.com
>
>
>On Fri, Feb 21, 2014 at 2:11 PM, Arpit Agarwal  
>wrote:
>
>You can try building Apache Hadoop with these instructions:
>>
>>https://wiki.apache.org/hadoop/Hadoop2OnWindows
>>
>>
>>32-bit Windows has not been tested.
>>
>>CONFIDENTIALITY NOTICE
>>NOTICE: This message is intended for the use of the individual or entity to 
>>which it is addressed and may contain information that is confidential, 
>>privileged and exempt from disclosure under applicable law. If the reader of 
>>this message is not the intended recipient, you are hereby notified that any 
>>printing, copying, dissemination, distribution, disclosure or forwarding of 
>>this communication is strictly prohibited. If you have received this 
>>communication in error, please contact the sender immediately and delete it 
>>from your system. Thank You.
>
>
>

Re: No job shown in Hadoop resource manager web UI when running jobs in the cluster

2014-02-21 Thread Zhijie Shen
Hi Richard,

Not sure how NPE happened on you command line, but I'd like to clarify
something here:
1. If you want to see mapreduce jobs, please use "mapred job". "hadoop job"
is deprecated. If you want to see all kinds of applications run by your
YARN cluster, please use "yarn application".

2. Job history server only shows the finished mapreduce jobs. There will be
another application history server that shows all completed applications
run by YARN, but it's not available on 2.2.

3. ResourceManager webUI is not the job history web UI. You should check
your yarn-site.xml to see what's the address of the RM webUI. It will list
all the applications that RM remembers.

- Zhijie


On Thu, Feb 20, 2014 at 7:04 PM, Chen, Richard wrote:

>  Dear group,
>
>
>
> I compiled hadoop 2.2.0 x64 and running it on a cluster. When I do hadoop
> job -list or hadoop job -list all, it throws a NPE like this:
>
> 14/01/28 17:18:39 INFO Configuration.deprecation: session.id is
> deprecated. Instead, use dfs.metrics.session-id
>
> 14/01/28 17:18:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
>
> Exception in thread "main" java.lang.NullPointerException
>
> at org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:504)
>
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:312)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1237)
>
> and on hadoop webapp like jobhistory ( I turn on the jobhistory server).
> It shows no job was running and no job finishing although I was running
> jobs.
>
> Please help me to solve this problem.
>
> Thanks!!
>
>
>
> Richard Chen
>
>
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Questions from a newbie to Hadoop

2014-02-21 Thread Devin Suiter RDX
You should also clarify for the group:

Do you want to make a virtual machine to run a pseudo-distributed Hadoop
cluster on?

Or

Do you want to install Hadoop directly onto the Vista machine and run it
there?

If the former, you should be able to set up a VM just fine with a Linux
version of your choice - we use CentOS for that sort of thing here - and
set up P-D mode on it, just by using VirtualBox.

If the latter...

You make want to invest in some extra RAM - no, I'm kidding. :-)
HortonWorks has the stable Windows builds as Arpit has already directed you
to.

You could also consider the Amazon and Microsoft Azure cloud versions of
Hadoop, if you can afford to pay for a few cycles every month. They're
pretty affordable, and those Amazon $100 gift cards for EC2 are popular
giveaway items at trade shows and whatnot...

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Fri, Feb 21, 2014 at 2:11 PM, Arpit Agarwal wrote:

> You can try building Apache Hadoop with these instructions:
> https://wiki.apache.org/hadoop/Hadoop2OnWindows
>
> 32-bit Windows has not been tested.
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


Re: Questions from a newbie to Hadoop

2014-02-21 Thread Arpit Agarwal
You can try building Apache Hadoop with these instructions:
https://wiki.apache.org/hadoop/Hadoop2OnWindows

32-bit Windows has not been tested.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Questions from a newbie to Hadoop

2014-02-21 Thread Publius


 Hello

I am new to hadoop and trying to learn it

I want to set up Psuedo distributed mode on my windows vista (32 bit) machine 
to experiment with?

I am having great difficulty locating the correct software(s) to do this?

JDK1.6 or JDK1.7, exclipse?
oracle VM box, VMplayer, etc
CH4, hortonworks, or just apache hadoop?


I have tried to load and run VMplayer and hortonworks Hadoop so far, 
but the horton works version is set up for 64 bit and my poor vista machine is 
32 bit :(


can someone offer advice on how to do this?

Please help IF able 

If somone wanst to email offline that if fine also


Troy



KMG 365 

Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-21 Thread ricky l
Does Hadoop capacity scheduler support preemption in this scenario?
Based on what Vinod says, the preemption seems to be supported by
configuration. If so, can someone point me an instruction to do that?
The preemption would really be helpful for my use-case. thanks.

On Fri, Feb 21, 2014 at 12:39 AM, Vinod Kumar Vavilapalli
 wrote:
>
> Yes, it does take those extra resources away back to queue B. How quickly it
> takes them away depends on whether preemption is enabled or not. If
> preemption is not enabled, it 'takes away' as and when containers from queue
> A start finishing.
>
> +Binod
>
> On Feb 19, 2014, at 5:35 PM, Alex Nastetsky  wrote:
>
> Will the scheduler take away the 10% from queue B and give it back to queue
> A even if queue B needs it? If not, it would seem that the scheduler is
> reneging on its guarantee.
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.


[no subject]

2014-02-21 Thread Aaron Zimmerman
The worker nodes on my version 2.2 cluster won't use more than 11 of the 30
total (24
allocated) for mapreduce jobs running in Yarn.   Does anyone have an idea
what might be constraining the usage of Ram?

I followed the steps listed here:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
,
and http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/.
to set various memory configuration, but no matter what I try, the nodes on
the cluster don't use more than 11GB of the allocated 26GB.

The yarn resource manager reports that it is using all of the allocated
memory in the status across the top, but according to TOP and other such,
it is not.

I see org.apache.hadoop.mapred.YarnChild processes being created with
-Xmx756m, but I can't find this anywhere in mapreduce or yarn
configurations.

yarn.nodemanager.resource.memory-mb = 24576
yarn.scheduler.minimum-allocation-mb = 3072
yarn_heapsize=2 (not really clear to me what this does...?)
mapreduce2 config:
mapreduce.map.memory.mb = 4096
mapreduce.reduce.memory.mb = 8192
mapreduce.map.java.opts = -Xmx3500
mapreduce.reduce.java.opts = -Xmx7000

Thanks!

Aaron Zimmerman


Cleanup after Yarn Job

2014-02-21 Thread Brian C. Huffman

All,

I'm trying to model a Yarn Client after the Distributed Shell example.  
However I'd like to add a method to cleanup the job's files after 
completion.


I've defined a cleanup routine:
  private void cleanup(ApplicationId appId, FileSystem fs)
  throws IOException {
String PathSuffix = appName + "/" + appId.getId();
Path Dst = new Path(fs.getHomeDirectory(), PathSuffix);
fs.delete(Dst, true);
  }

The problem that I'm having is that I'd like to call it after 
monitorApplication exits, but in the case that the time limit was 
exceeded and killApplication is called, both the appId and the 
FileSystem objects are gone.  I could get around the appId issue since I 
really only need a String or integer representation, but since Yarn 
Client seems to be managing the filesystem object (the example uses 
FileSystem.get(conf)), I'm not sure of a way around that unless I create 
my own FileSystem object.


Any suggestions?

Thanks,
Brian



Path filters for multiple inputs

2014-02-21 Thread AnilKumar B
Hi,

May I know, how can I apply path filters for multiple inputs?

I mean for each multiple input I need to apply different  filters. Is it
possible? I tried to set my own PathFilter to FileInputFormat, but it is
applying for all multiple inputs.

And also how can I ignore the sub directories in each multiple inputs?

Thanks & Regards,
B Anil Kumar.


JobHistoryEventHandler failed with AvroTypeException.

2014-02-21 Thread Rohith Sharma K S
Hi all,

I am using Hadoop-2.3 for Yarn Cluster.

While running job, I encountered below exception in MRAppmaster.  Why this 
error is logging?

2014-02-21 22:10:33,841 INFO [Thread-355] 
org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
failed in state STOPPED; cause: org.apache.avro.AvroTypeException: Attempt to 
process a enum when a string was expected.
org.apache.avro.AvroTypeException: Attempt to process a enum when a string was 
expected.
at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
at 
org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:217)
at 
org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at 
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:870)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1386)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:550)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:602)

Thanks & Regards
Rohith Sharma K S


Datanodes going out of reach in hadoop

2014-02-21 Thread Yogini Gulkotwar
Hello,
I am working with a 5 node hadoop cluster. The hdfs is on a shared NFS
directory of 98TB.
So when we view the namenode UI, the following is displayed:

 *Node*
*Last Contact* *Admin State*
*ConfiguredCapacity (TB)*
*Used(TB)*
*Non DFS Used (TB)*
*Remaining(TB)*
*Used(%)*
*Remaining(%)* *Blocks*
*Block PoolUsed (TB)*
*Block PoolUsed (%)> Blocks* *Failed Volumes*  Datanode1 0 In Service 97.39
1.83 38.04 57.52 1.88 59.06 80653 1.83 1.88 0  Datanode2 1 In Service 97.39
1.18 38.69 57.52 1.21 59.06 54536 1.18 1.21 0  Datanode3 0 In Service 97.39
1.61 38.26 57.52 1.65 59.06 66902 1.61 1.65 0  Datanode4 2 In Service 97.39
0.65 39.22 57.52 0.67 59.06 32821 0.65 0.67 0  Datanode5 2 In Service 97.39
0.58 39.29 57.52 0.6 59.06 29278 0.58 0.6 0

As can be seen, the each datanode thinks that it has the entire 98TB to
itself. And three of the datanodes (1,2,3) have comparatively more data.
The balancing command doesn't help due to this situation.

And in the recent times, I have come across a strange issue. The three
datanodes with more data go out of reach from the namenode (at different
instances).
That is, the services on the datanode is running but the "LAST CONTACT"
column in the above table reports a high value and after a while NAMENODE
reports the node as DEAD.
Within 10 minutes or so, the datanode goes LIVE again.
I tried going through the logs, but couldn't find any error.
I tried increasing the ulimit on these datanodes, but in vain.

Is there something that needs to done to overcome this issue?

Any configuration changes? Any help would be appreciated.

Thanks & Regards,

Yogini Gulkotwar│Data Scientist

*Flutura Business Solutions Private Limited*
*​​*

​*BANGALORE*

​


Re: Service Level Authorization

2014-02-21 Thread Juan Carlos
Thanks Alex, my path to the queue was a mistake when I was testing
configurations and was unable to make work ACLs. My major problem was
about mapreduce.cluster.administrators
parameters. I didn't know anything about this parameter, I have been
looking for it in
http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xmlbut
it is missing there.
Thanks for your help, it worked as soon as I set that property to my hadoop
admin group.


2014-02-20 17:38 GMT+01:00 Alex Nastetsky :

> If your test1 queue is under test queue, then you have to specify the path
> in the same way:
>
> yarn.scheduler.capacity.root.test.test1.acl_submit_applications (you are
> missing the "test")
>
> Also, if your "hadoop" user is a member of user group "hadoop", that is
> the default value of the mapreduce.cluster.administrators in
> mapred-site.xml. Users of that group can submit jobs to and administer all
> queues.
>
>
> On Thu, Feb 20, 2014 at 11:28 AM, Juan Carlos  wrote:
>
>> Yes, that is what I'm looking for, but I couldn't find this information
>> for hadoop 2.2.0. I saw mapreduce.cluster.acls.enabled it's now the
>> parameter to use. But I don't know how to set my ACLs.
>> I'm using capacity schedurler and I've created 3 new queues test (which
>> is under root at the same level as default) and test1 and test2, which are
>> under test. As I said, I enabled mapreduce.cluster.acls.enabled in
>> mapred-site.xml and later added the parameter
>> yarn.scheduler.capacity.root.test1.acl_submit_applications with value
>> "jcfernandez ". If I submit a job to queue test1 with user hadoop, it
>> allows it to run it.
>> Which is my error?
>>
>>
>> 2014-02-20 16:41 GMT+01:00 Alex Nastetsky :
>>
>> Juan,
>>>
>>> What kind of information are you looking for? The service level ACLs are
>>> for limiting which services can communicate under certain protocols, by
>>> username or user group.
>>>
>>> Perhaps you are looking for client level ACL, something like the
>>> MapReduce ACLs?
>>> https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Job+Authorization
>>>
>>> Alex.
>>>
>>>
>>> 2014-02-20 4:58 GMT-05:00 Juan Carlos :
>>>
>>> Where could I find some information about ACL? I only could find the
 available in
 http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html,
  which isn't so detailed.
 Regards

 Juan Carlos Fernández Rodríguez
 Consultor Tecnológico

 Telf: +34918105294
 Móvil: +34639311788

 CEDIANT
 Centro para el Desarrollo, Investigación y Aplicación de Nuevas
 Tecnologías
 HPC Business Solutions

 * AVISO LEGAL *
 Este mensaje es solamente para la persona a la que va dirigido. Puede
 contener información confidencial o legalmente protegida. No hay renuncia a
 la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si
 usted ha recibido este mensaje por error,le rogamos que borre de su sistema
 inmediatamente el mensaje asi como todas sus copias, destruya todas las
 copias del mismo de su disco duro y notifique al remitente. No debe,
 directa o indirectamente, usar, revelar, distribuir, imprimir o copiar
 ninguna de las partes de este mensaje si no es usted el destinatario.
 Cualquier opinión expresada en este mensaje proviene del remitente, excepto
 cuando el mensaje establezca lo contrario y el remitente esté autorizado
 para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el
 correo electrónico vía Internet no permite asegurar ni la confidencialidad
 de los mensajes que se transmiten ni la correcta recepción de los mismos.
 En el caso de que el destinatario de este mensaje no consintiera la
 utilización del correo electrónico vía Internet, rogamos lo ponga en
 nuestro conocimiento de manera inmediata.

 * DISCLAIMER *
  This message is intended exclusively for the named person. It may
 contain confidential, propietary or legally privileged information. No
 confidentiality or privilege is waived or lost by any mistransmission. If
 you receive this message in error, please immediately delete it and all
 copies of it from your system, destroy any hard copies of it an notify the
 sender. Your must not, directly or indirectly, use, disclose, distribute,
 print, or copy any part of this message if you are not the intended
 recipient. Any views expressed in this message are those of the individual
 sender, except where the message states otherwise and the sender is
 authorised to state them to be the views of 'CEDIANT'. Please note that
 internet e-mail neither guarantees the confidentiality nor the proper
 receipt of the message sent. If the addressee of this message does not
 consent to the use of internet e-mail, please communicate it to us
>