Re: Having trouble adding external JAR to MapReduce Program
Thanks for the suggestions! I will give them a look and try to see it it will work! Jonathan On Fri, Feb 21, 2014 at 6:35 PM, Azuryy Yu wrote: > Hi, > > you cannot add jar like this way. > > please look at DistributeCache in the Hadoop Java Doc. > > please call DistributeCache.addArchive() in your main Class before submit > the MR job. > > > On Sat, Feb 22, 2014 at 9:30 AM, Gaurav Gupta wrote: > >> Jonathan, >> >> >> >> You have to make sure that the jar is available on the nodes where the >> map reduce job is running. Setting the HADOOP_CLASSPATH on the single node >> doesn't work. >> >> You can use -libjars to the hadoop command line. >> >> >> >> Thanks >> >> Gaurav >> >> >> >> *From:* Jonathan Poon [mailto:jkp...@ucdavis.edu] >> *Sent:* Friday, February 21, 2014 5:12 PM >> *To:* user@hadoop.apache.org >> *Subject:* Having trouble adding external JAR to MapReduce Program >> >> >> >> Hi Everyone, >> >> I'm running into trouble adding the Avro JAR into my MapReduce program. >> I do the following to try to add the Avro JAR: >> >> export >> HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar" >> >> export >> LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar" >> >> hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS} >> /user/jonathanpoon/avro /user/jonathanpoon/output >> >> However, I get the following error: >> >> 14/02/21 17:01:17 INFO mapred.JobClient: Task Id : >> attempt_201402191318_0014_m_01_2, Status : FAILED >> java.lang.RuntimeException: java.lang.ClassNotFoundException: >> org.apache.avro.mapreduce.AvroKeyInputFormat >> at >> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857) >> at >> org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.avro.mapreduce.AvroKeyInputFormat >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) >> at >> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855) >> ... 8 more >> >> >> Am I placing the Avro JAR files in the improper place? >> >> Thanks for your help! >> >> Jonathan >> >> >> > >
Re: Having trouble adding external JAR to MapReduce Program
Hi, you cannot add jar like this way. please look at DistributeCache in the Hadoop Java Doc. please call DistributeCache.addArchive() in your main Class before submit the MR job. On Sat, Feb 22, 2014 at 9:30 AM, Gaurav Gupta wrote: > Jonathan, > > > > You have to make sure that the jar is available on the nodes where the map > reduce job is running. Setting the HADOOP_CLASSPATH on the single node > doesn't work. > > You can use -libjars to the hadoop command line. > > > > Thanks > > Gaurav > > > > *From:* Jonathan Poon [mailto:jkp...@ucdavis.edu] > *Sent:* Friday, February 21, 2014 5:12 PM > *To:* user@hadoop.apache.org > *Subject:* Having trouble adding external JAR to MapReduce Program > > > > Hi Everyone, > > I'm running into trouble adding the Avro JAR into my MapReduce program. I > do the following to try to add the Avro JAR: > > export > HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar" > > export > LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar" > > hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS} > /user/jonathanpoon/avro /user/jonathanpoon/output > > However, I get the following error: > > 14/02/21 17:01:17 INFO mapred.JobClient: Task Id : > attempt_201402191318_0014_m_01_2, Status : FAILED > java.lang.RuntimeException: java.lang.ClassNotFoundException: > org.apache.avro.mapreduce.AvroKeyInputFormat > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857) > at > org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.ClassNotFoundException: > org.apache.avro.mapreduce.AvroKeyInputFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:270) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855) > ... 8 more > > > Am I placing the Avro JAR files in the improper place? > > Thanks for your help! > > Jonathan > > >
RE: Having trouble adding external JAR to MapReduce Program
Jonathan, You have to make sure that the jar is available on the nodes where the map reduce job is running. Setting the HADOOP_CLASSPATH on the single node doesn't work. You can use -libjars to the hadoop command line. Thanks Gaurav From: Jonathan Poon [mailto:jkp...@ucdavis.edu] Sent: Friday, February 21, 2014 5:12 PM To: user@hadoop.apache.org Subject: Having trouble adding external JAR to MapReduce Program Hi Everyone, I'm running into trouble adding the Avro JAR into my MapReduce program. I do the following to try to add the Avro JAR: export HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avr o-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro -1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar" export LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/a vro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/av ro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar" hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS} /user/jonathanpoon/avro /user/jonathanpoon/output However, I get the following error: 14/02/21 17:01:17 INFO mapred.JobClient: Task Id : attempt_201402191318_0014_m_01_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.avro.mapreduce.AvroKeyInputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857) at org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:1 87) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.avro.mapreduce.AvroKeyInputFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855) ... 8 more Am I placing the Avro JAR files in the improper place? Thanks for your help! Jonathan
Having trouble adding external JAR to MapReduce Program
Hi Everyone, I'm running into trouble adding the Avro JAR into my MapReduce program. I do the following to try to add the Avro JAR: export HADOOP_CLASSPATH="/tmp/singleEvent.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar:/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar:/usr/local/hadoop/hadoop-core-1.2.1.jar" export LIBJARS="/tmp/singleEvent.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-mapred-1.7.6-hadoop1.jar,/home/jonathanpoon/local/lib/java/avro-1.7.6/avro-tools-1.7.6.jar,/usr/local/hadoop/hadoop-core-1.2.1.jar" hadoop jar AvroReader.jar org.avro.AvroReader -libjars ${LIBJARS} /user/jonathanpoon/avro /user/jonathanpoon/output However, I get the following error: 14/02/21 17:01:17 INFO mapred.JobClient: Task Id : attempt_201402191318_0014_m_01_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.avro.mapreduce.AvroKeyInputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857) at org.apache.hadoop.mapreduce.JobContext.getInputFormatClass(JobContext.java:187) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: org.apache.avro.mapreduce.AvroKeyInputFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855) ... 8 more Am I placing the Avro JAR files in the improper place? Thanks for your help! Jonathan
Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin
Hi John, FWIW, setting the log level of org.apache.hadoop.security.UserGroupInformation to ERROR seemed to prevent the fatal NameNode slowdown we ran into. Although I still saw "no such user" Shell$ExitCodeException messages in the logs, these only occurred every few minutes or so. Thus, it seems like this is a reasonable work around until the underlying problem is fixed. I suggest that you file a JIRA ticket, though, as nobody seems to be rushing in here to tell us what we're doing wrong. Thanks, - Chris On Feb 18, 2014, at 5:54 PM, Chris Schneider wrote: > Hi John, > > My AWS Elastic MapReduce NameNode is also filling its log file with messages > like the following: > > 2014-02-18 23:56:52,344 WARN org.apache.hadoop.security.UserGroupInformation > (IPC Server handler 78 on 9000): No groups available for user > job_201402182309_0073 > 2014-02-18 23:56:52,351 WARN org.apache.hadoop.security.UserGroupInformation > (IPC Server handler 48 on 9000): No groups available for user > job_201402182309_0073 > 2014-02-18 23:56:52,356 WARN org.apache.hadoop.security.UserGroupInformation > (IPC Server handler 38 on 9000): No groups available for user > job_201402182309_0073 > > I ran into this same issue in March 2013 and got past it by using an > m1.xlarge master node (instead of my usual m1.large) when (like right now) I > double my slave count (to 32 cc2.8xlarge instances) to re-import a lot of my > input data. Using that m1.xlarge didn't prevent the NameNode from logging > messages like this, but the beefier instance seemed to weather the load these > messages represented better. > > Unfortunately, even my m1.xlarge master node now seems overwhelmed. The > cluster starts off fine, efficiently mowing through the jobs in my job flow > step for a few hours, but it eventually gets into a mode where the copy phase > of the reduce jobs appear to make no progress at all. At that point, the > NameNode seems to be spending all of its time writing messages like the ones > above. > > The issue doesn't seem to be related to the NameNode JVM size (I tried > increasing it to 4GB before I realized it never used more than ~400MB), nor > dfs.namenode.handler.count (which I increased from 64 to 96). > > We're currently trying to work around the problem by hacking log4j.properties > to set the logging level for org.apache.hadoop.security.UserGroupInformation > to ERROR. We might have to do so for the entire package, as I've also seen > the following in the NameNode logs: > > 2014-02-19 01:01:24,184 WARN > org.apache.hadoop.security.ShellBasedUnixGroupsMapping (IPC Server handler 84 > on 9000): got exception trying to get groups for user job_201402182309_0226 > org.apache.hadoop.util.Shell$ExitCodeException: id: job_201402182309_0226: No > such user > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:255) > at org.apache.hadoop.util.Shell.run(Shell.java:182) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:78) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:53) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:79) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1037) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.(FSPermissionChecker.java:50) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5218) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:5201) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2030) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:850) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) > > I would also be very interested in hearing Jakob Homan and Deveraj Das > respond to your analysis of the changes made for MAPREDUCE-1457. > > Please post again wi
Re: Questions from a newbie to Hadoop
I wish to run pseudo machine on a virtual box I have it almost running on oracle virtual box with Hortonworks sandbox 2.0, but the virtual appliance is wanting a 64 bit CPU, and mine is only 32 bit still looking for a 32 bit version of Hortonworks sand box 2.0 Hortonworks seems to be very stable and good so far, very easy to set up what does this mean; set up P-D mode on it? :O thanks for the reply From: Devin Suiter RDX >To: user@hadoop.apache.org >Sent: Friday, February 21, 2014 11:30 AM >Subject: Re: Questions from a newbie to Hadoop > > > >You should also clarify for the group: > > >Do you want to make a virtual machine to run a pseudo-distributed Hadoop >cluster on? > > >Or > > >Do you want to install Hadoop directly onto the Vista machine and run it there? > > >If the former, you should be able to set up a VM just fine with a Linux >version of your choice - we use CentOS for that sort of thing here - and set >up P-D mode on it, just by using VirtualBox. > > >If the latter... > > >You make want to invest in some extra RAM - no, I'm kidding. :-) HortonWorks >has the stable Windows builds as Arpit has already directed you to. > > >You could also consider the Amazon and Microsoft Azure cloud versions of >Hadoop, if you can afford to pay for a few cycles every month. They're pretty >affordable, and those Amazon $100 gift cards for EC2 are popular giveaway >items at trade shows and whatnot... > > >Devin Suiter >Jr. Data Solutions Software Engineer >100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 >Google Voice: 412-256-8556 | www.rdx.com > > >On Fri, Feb 21, 2014 at 2:11 PM, Arpit Agarwal >wrote: > >You can try building Apache Hadoop with these instructions: >> >>https://wiki.apache.org/hadoop/Hadoop2OnWindows >> >> >>32-bit Windows has not been tested. >> >>CONFIDENTIALITY NOTICE >>NOTICE: This message is intended for the use of the individual or entity to >>which it is addressed and may contain information that is confidential, >>privileged and exempt from disclosure under applicable law. If the reader of >>this message is not the intended recipient, you are hereby notified that any >>printing, copying, dissemination, distribution, disclosure or forwarding of >>this communication is strictly prohibited. If you have received this >>communication in error, please contact the sender immediately and delete it >>from your system. Thank You. > > >
Re: No job shown in Hadoop resource manager web UI when running jobs in the cluster
Hi Richard, Not sure how NPE happened on you command line, but I'd like to clarify something here: 1. If you want to see mapreduce jobs, please use "mapred job". "hadoop job" is deprecated. If you want to see all kinds of applications run by your YARN cluster, please use "yarn application". 2. Job history server only shows the finished mapreduce jobs. There will be another application history server that shows all completed applications run by YARN, but it's not available on 2.2. 3. ResourceManager webUI is not the job history web UI. You should check your yarn-site.xml to see what's the address of the RM webUI. It will list all the applications that RM remembers. - Zhijie On Thu, Feb 20, 2014 at 7:04 PM, Chen, Richard wrote: > Dear group, > > > > I compiled hadoop 2.2.0 x64 and running it on a cluster. When I do hadoop > job -list or hadoop job -list all, it throws a NPE like this: > > 14/01/28 17:18:39 INFO Configuration.deprecation: session.id is > deprecated. Instead, use dfs.metrics.session-id > > 14/01/28 17:18:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > > Exception in thread "main" java.lang.NullPointerException > > at org.apache.hadoop.mapreduce.tools.CLI.listJobs(CLI.java:504) > > at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:312) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1237) > > and on hadoop webapp like jobhistory ( I turn on the jobhistory server). > It shows no job was running and no job finishing although I was running > jobs. > > Please help me to solve this problem. > > Thanks!! > > > > Richard Chen > > > -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Questions from a newbie to Hadoop
You should also clarify for the group: Do you want to make a virtual machine to run a pseudo-distributed Hadoop cluster on? Or Do you want to install Hadoop directly onto the Vista machine and run it there? If the former, you should be able to set up a VM just fine with a Linux version of your choice - we use CentOS for that sort of thing here - and set up P-D mode on it, just by using VirtualBox. If the latter... You make want to invest in some extra RAM - no, I'm kidding. :-) HortonWorks has the stable Windows builds as Arpit has already directed you to. You could also consider the Amazon and Microsoft Azure cloud versions of Hadoop, if you can afford to pay for a few cycles every month. They're pretty affordable, and those Amazon $100 gift cards for EC2 are popular giveaway items at trade shows and whatnot... *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Fri, Feb 21, 2014 at 2:11 PM, Arpit Agarwal wrote: > You can try building Apache Hadoop with these instructions: > https://wiki.apache.org/hadoop/Hadoop2OnWindows > > 32-bit Windows has not been tested. > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
Re: Questions from a newbie to Hadoop
You can try building Apache Hadoop with these instructions: https://wiki.apache.org/hadoop/Hadoop2OnWindows 32-bit Windows has not been tested. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Questions from a newbie to Hadoop
Hello I am new to hadoop and trying to learn it I want to set up Psuedo distributed mode on my windows vista (32 bit) machine to experiment with? I am having great difficulty locating the correct software(s) to do this? JDK1.6 or JDK1.7, exclipse? oracle VM box, VMplayer, etc CH4, hortonworks, or just apache hadoop? I have tried to load and run VMplayer and hortonworks Hadoop so far, but the horton works version is set up for 64 bit and my poor vista machine is 32 bit :( can someone offer advice on how to do this? Please help IF able If somone wanst to email offline that if fine also Troy KMG 365
Re: Capacity Scheduler capacity vs. maximum-capacity
Does Hadoop capacity scheduler support preemption in this scenario? Based on what Vinod says, the preemption seems to be supported by configuration. If so, can someone point me an instruction to do that? The preemption would really be helpful for my use-case. thanks. On Fri, Feb 21, 2014 at 12:39 AM, Vinod Kumar Vavilapalli wrote: > > Yes, it does take those extra resources away back to queue B. How quickly it > takes them away depends on whether preemption is enabled or not. If > preemption is not enabled, it 'takes away' as and when containers from queue > A start finishing. > > +Binod > > On Feb 19, 2014, at 5:35 PM, Alex Nastetsky wrote: > > Will the scheduler take away the 10% from queue B and give it back to queue > A even if queue B needs it? If not, it would seem that the scheduler is > reneging on its guarantee. > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You.
[no subject]
The worker nodes on my version 2.2 cluster won't use more than 11 of the 30 total (24 allocated) for mapreduce jobs running in Yarn. Does anyone have an idea what might be constraining the usage of Ram? I followed the steps listed here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html , and http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/. to set various memory configuration, but no matter what I try, the nodes on the cluster don't use more than 11GB of the allocated 26GB. The yarn resource manager reports that it is using all of the allocated memory in the status across the top, but according to TOP and other such, it is not. I see org.apache.hadoop.mapred.YarnChild processes being created with -Xmx756m, but I can't find this anywhere in mapreduce or yarn configurations. yarn.nodemanager.resource.memory-mb = 24576 yarn.scheduler.minimum-allocation-mb = 3072 yarn_heapsize=2 (not really clear to me what this does...?) mapreduce2 config: mapreduce.map.memory.mb = 4096 mapreduce.reduce.memory.mb = 8192 mapreduce.map.java.opts = -Xmx3500 mapreduce.reduce.java.opts = -Xmx7000 Thanks! Aaron Zimmerman
Cleanup after Yarn Job
All, I'm trying to model a Yarn Client after the Distributed Shell example. However I'd like to add a method to cleanup the job's files after completion. I've defined a cleanup routine: private void cleanup(ApplicationId appId, FileSystem fs) throws IOException { String PathSuffix = appName + "/" + appId.getId(); Path Dst = new Path(fs.getHomeDirectory(), PathSuffix); fs.delete(Dst, true); } The problem that I'm having is that I'd like to call it after monitorApplication exits, but in the case that the time limit was exceeded and killApplication is called, both the appId and the FileSystem objects are gone. I could get around the appId issue since I really only need a String or integer representation, but since Yarn Client seems to be managing the filesystem object (the example uses FileSystem.get(conf)), I'm not sure of a way around that unless I create my own FileSystem object. Any suggestions? Thanks, Brian
Path filters for multiple inputs
Hi, May I know, how can I apply path filters for multiple inputs? I mean for each multiple input I need to apply different filters. Is it possible? I tried to set my own PathFilter to FileInputFormat, but it is applying for all multiple inputs. And also how can I ignore the sub directories in each multiple inputs? Thanks & Regards, B Anil Kumar.
JobHistoryEventHandler failed with AvroTypeException.
Hi all, I am using Hadoop-2.3 for Yarn Cluster. While running job, I encountered below exception in MRAppmaster. Why this error is logging? 2014-02-21 22:10:33,841 INFO [Thread-355] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state STOPPED; cause: org.apache.avro.AvroTypeException: Attempt to process a enum when a string was expected. org.apache.avro.AvroTypeException: Attempt to process a enum when a string was expected. at org.apache.avro.io.parsing.Parser.advance(Parser.java:93) at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:217) at org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:870) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1386) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:550) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:602) Thanks & Regards Rohith Sharma K S
Datanodes going out of reach in hadoop
Hello, I am working with a 5 node hadoop cluster. The hdfs is on a shared NFS directory of 98TB. So when we view the namenode UI, the following is displayed: *Node* *Last Contact* *Admin State* *ConfiguredCapacity (TB)* *Used(TB)* *Non DFS Used (TB)* *Remaining(TB)* *Used(%)* *Remaining(%)* *Blocks* *Block PoolUsed (TB)* *Block PoolUsed (%)> Blocks* *Failed Volumes* Datanode1 0 In Service 97.39 1.83 38.04 57.52 1.88 59.06 80653 1.83 1.88 0 Datanode2 1 In Service 97.39 1.18 38.69 57.52 1.21 59.06 54536 1.18 1.21 0 Datanode3 0 In Service 97.39 1.61 38.26 57.52 1.65 59.06 66902 1.61 1.65 0 Datanode4 2 In Service 97.39 0.65 39.22 57.52 0.67 59.06 32821 0.65 0.67 0 Datanode5 2 In Service 97.39 0.58 39.29 57.52 0.6 59.06 29278 0.58 0.6 0 As can be seen, the each datanode thinks that it has the entire 98TB to itself. And three of the datanodes (1,2,3) have comparatively more data. The balancing command doesn't help due to this situation. And in the recent times, I have come across a strange issue. The three datanodes with more data go out of reach from the namenode (at different instances). That is, the services on the datanode is running but the "LAST CONTACT" column in the above table reports a high value and after a while NAMENODE reports the node as DEAD. Within 10 minutes or so, the datanode goes LIVE again. I tried going through the logs, but couldn't find any error. I tried increasing the ulimit on these datanodes, but in vain. Is there something that needs to done to overcome this issue? Any configuration changes? Any help would be appreciated. Thanks & Regards, Yogini Gulkotwar│Data Scientist *Flutura Business Solutions Private Limited* ** *BANGALORE*
Re: Service Level Authorization
Thanks Alex, my path to the queue was a mistake when I was testing configurations and was unable to make work ACLs. My major problem was about mapreduce.cluster.administrators parameters. I didn't know anything about this parameter, I have been looking for it in http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xmlbut it is missing there. Thanks for your help, it worked as soon as I set that property to my hadoop admin group. 2014-02-20 17:38 GMT+01:00 Alex Nastetsky : > If your test1 queue is under test queue, then you have to specify the path > in the same way: > > yarn.scheduler.capacity.root.test.test1.acl_submit_applications (you are > missing the "test") > > Also, if your "hadoop" user is a member of user group "hadoop", that is > the default value of the mapreduce.cluster.administrators in > mapred-site.xml. Users of that group can submit jobs to and administer all > queues. > > > On Thu, Feb 20, 2014 at 11:28 AM, Juan Carlos wrote: > >> Yes, that is what I'm looking for, but I couldn't find this information >> for hadoop 2.2.0. I saw mapreduce.cluster.acls.enabled it's now the >> parameter to use. But I don't know how to set my ACLs. >> I'm using capacity schedurler and I've created 3 new queues test (which >> is under root at the same level as default) and test1 and test2, which are >> under test. As I said, I enabled mapreduce.cluster.acls.enabled in >> mapred-site.xml and later added the parameter >> yarn.scheduler.capacity.root.test1.acl_submit_applications with value >> "jcfernandez ". If I submit a job to queue test1 with user hadoop, it >> allows it to run it. >> Which is my error? >> >> >> 2014-02-20 16:41 GMT+01:00 Alex Nastetsky : >> >> Juan, >>> >>> What kind of information are you looking for? The service level ACLs are >>> for limiting which services can communicate under certain protocols, by >>> username or user group. >>> >>> Perhaps you are looking for client level ACL, something like the >>> MapReduce ACLs? >>> https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Job+Authorization >>> >>> Alex. >>> >>> >>> 2014-02-20 4:58 GMT-05:00 Juan Carlos : >>> >>> Where could I find some information about ACL? I only could find the available in http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html, which isn't so detailed. Regards Juan Carlos Fernández Rodríguez Consultor Tecnológico Telf: +34918105294 Móvil: +34639311788 CEDIANT Centro para el Desarrollo, Investigación y Aplicación de Nuevas Tecnologías HPC Business Solutions * AVISO LEGAL * Este mensaje es solamente para la persona a la que va dirigido. Puede contener información confidencial o legalmente protegida. No hay renuncia a la confidencialidad o privilegio por cualquier transmisión mala/errónea. Si usted ha recibido este mensaje por error,le rogamos que borre de su sistema inmediatamente el mensaje asi como todas sus copias, destruya todas las copias del mismo de su disco duro y notifique al remitente. No debe, directa o indirectamente, usar, revelar, distribuir, imprimir o copiar ninguna de las partes de este mensaje si no es usted el destinatario. Cualquier opinión expresada en este mensaje proviene del remitente, excepto cuando el mensaje establezca lo contrario y el remitente esté autorizado para establecer que dichas opiniones provienen de 'CEDIANT'. Nótese que el correo electrónico vía Internet no permite asegurar ni la confidencialidad de los mensajes que se transmiten ni la correcta recepción de los mismos. En el caso de que el destinatario de este mensaje no consintiera la utilización del correo electrónico vía Internet, rogamos lo ponga en nuestro conocimiento de manera inmediata. * DISCLAIMER * This message is intended exclusively for the named person. It may contain confidential, propietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it an notify the sender. Your must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of 'CEDIANT'. Please note that internet e-mail neither guarantees the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internet e-mail, please communicate it to us >