Re: Unsubscribe
Unsubscribe
Re: New to this group.
Hi Krish, I completed an MSc project using Hadoop this summer from installation through to programming with the Java Api and then tuning. In all I did about 14 weeks solid with limited unix, server experience and an academic knowledge of Java skills from my Masters course. I got an A ;O) Along the way I installed Eclipse, got Hadoop to work with it and built a genetic sequence alignment tool. It was hard work but I had a blast. I ran it on a 32 node cluster and got some good speedups. I¹m also interested in developing my skills further and this BigPetStore application seems like a good way to go. Following my course I¹m a trainee db admin for a global investment manager using Sybase. If you want to work on a collaborative project, I am sure I could share my Java skills and knowledge this far if you were happy to share your knowledge too. Why not connect on Linked In ;O) Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Krish Donald gotomyp...@gmail.com Reply-To: user@hadoop.apache.org Date: Friday, 2 January 2015 19:43 To: user@hadoop.apache.org Subject: Re: New to this group. I would like to go towards administration side not in development side as I don't know java at all... On Fri, Jan 2, 2015 at 11:37 AM, Jay Vyas jayunit100.apa...@gmail.com wrote: Many demos out there are for the business community... For a demonstration of hadoop at a finer grained level, how it's deployed, packaged, installed and used, for a developer who wants to learn hadoop the hard way, I'd suggest : 1 - Getting Apache bigtop stood up on VMs, and 2 - running the BigPetStore application , which is meant to demonstrate end to end building testing and deployment of a hadoop batch analytics system with mapreduce, pig, and mahout. This will also expose you to puppet, gradle, vagrant, all in a big data app which solves Real world problems like jar dependencies and multiple ecosystem components. Since BPS generates its own data, you don't waste time worrying about external data sets, Twitter credentials, etc, and can test both on your laptop and on a 100 node cluster (similar to teragen but for the whole ecosystem). Since it features integration tests and tested on Bigtops hadoop distribution, (which is 100% pure Apache based), it's imo the purest learning source, not blurred with company specific downloads or branding. Disclaimer : Of course I'm biased as I work on it... :) but we've been working hard to make bigtop easily consumable as a gateway drug to bigdata processing, and if you have solid linux and Java background, im sure others would agree it's great place to get immersed in the hadoop ecosystem. On Jan 2, 2015, at 1:05 PM, Krish Donald gotomyp...@gmail.com wrote: I would like to work on some kind of case studies like I have seen couple on Horton works like twitter sentiment analysis, web log analysis etc. But if somebody can give idea about other case studies which can be worked upon and can be put in resume later . As I don't have real time project experience. On Fri, Jan 2, 2015 at 10:33 AM, Ted Yu yuzhih...@gmail.com wrote: You can search for Open JIRAs which are related to admin. Here is an example query: https://issues.apache.org/jira/browse/HADOOP-9642?jql=project%20%3D%20HADOOP %20AND%20status%20%3D%20Open%20AND%20text%20~%20%22admin%22 FYI On Fri, Jan 2, 2015 at 10:24 AM, Krish Donald gotomyp...@gmail.com wrote: I have fair understanding of hadoop eco system... I have setup multinode cluster using VMs in my personal laptop for Hadoop 2.0 . But beyond that i would like to work on some project to get a good hold on the subject. I basically would like to go to into Hadoop Administartion side as my backgroud is RDBMS databases Admnistrator . On Fri, Jan 2, 2015 at 10:11 AM, Wilm Schumacher wilm.schumac...@gmail.com wrote: Hi, the standard books may be a good start: I liked the following definitive guide: http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 hadoop in action: http://www.manning.com/lam2/ hadoop in practive: http://www.manning.com/holmes2/ A list is here: http://wiki.apache.org/hadoop/Books Hope this helps. Best wishes, Wilm Am 02.01.2015 um 19:02 schrieb Krish Donald: Hi, I am new to this group and hadoop. Please help me to learn hadoop and suggest some self study project . Thanks Krish Donald
Re: Error when executing a WordCount Program
Hi have you set a class in your code ? WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). Also you need to check the path for your input file Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input These are pretty straight forward errors resolve them and you should be good to go. Sent from my iPhone On 10 Sep 2014, at 14:19, Shahab Yunus shahab.yu...@gmail.com wrote: hdfs://latdevweb02:9000/home/hadoop/hadoop/input is this is a valid path on hdfs? Can you access this path outside of the program? For example using hadoop fs -ls command? Also, was this path and files in it, created by a different user? The exception seem to say that it does not exist or the running user does not have permission to read it. Regards, Shahab On Wed, Sep 10, 2014 at 9:09 AM, YIMEN YIMGA Gael gael.yimen-yi...@sgcib.com wrote: Hello Hadoopers, Here is the error, I’m facing when running WordCount example program written by myself. Kind find attached the file of my WordCount program. Below the error. === -bash-4.1$ bin/hadoop jar WordCount.jar Entr?e dans le programme MAIN !!! 14/09/10 15:00:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/09/10 15:00:24 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/09/10 15:00:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/09/10 15:00:24 WARN snappy.LoadSnappy: Snappy native library not loaded 14/09/10 15:00:24 INFO mapred.JobClient: Cleaning up the staging area hdfs://latdevweb02:9000/user/hadoop/.staging/job_201409101141_0001 14/09/10 15:00:24 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at fr.societegenerale.bigdata.lactool.WordCountDriver.main(WordCountDriver.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) -bash-4.1$ === Thanks in advance for your help. Warm regards GYY * This message and any attachments (the message) are confidential, intended solely for the addressee(s), and may contain legally privileged information. Any unauthorised use or dissemination is prohibited. E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified. Please visit http://swapdisclosure.sgcib.com for important information with respect to derivative products. Ce message et toutes les pieces jointes (ci-apres le message) sont confidentiels et susceptibles de contenir des informations couvertes par le secret professionnel. Ce message est etabli a l'intention exclusive de ses destinataires. Toute
Re: total number of map tasks
Thanks for the update ;O) Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/Expert in all aspects of photography telephone: 0131 332 6967 tel:0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ weddings: www.wedding.chrismackenziephotography.co.uk http://www.wedding.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://twitter.com/#!/MacKenzieStudio http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250 http://www.linkedin.com/in/chrismackenziephotography/ http://pinterest.com/ChrisMacKenzieP/ On 27/08/2014 17:36, Stijn De Weirdt stijn.dewei...@ugent.be wrote: hi all, someone PM'ed me suggesting i'd take a look in the input split setting, and indeed, the splitsize is determining the number of tasks stijn On 08/27/2014 06:23 PM, Chris MacKenzie wrote: It's my understanding that you don't get map tasks as such but containers. My experience is with version 2 + And if that's true containers are based on memory tuning in mapred-site.xml Otherwise I'd love to learn more. Sent from my iPhone On 27 Aug 2014, at 12:14, Stijn De Weirdt stijn.dewei...@ugent.be wrote: hi all, we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes. (however, it's also not very clear if the recommendations are still valid for yarn) in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined. teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks hints are much appreciated stijn
Re: total number of map tasks
It's my understanding that you don't get map tasks as such but containers. My experience is with version 2 + And if that's true containers are based on memory tuning in mapred-site.xml Otherwise I'd love to learn more. Sent from my iPhone On 27 Aug 2014, at 12:14, Stijn De Weirdt stijn.dewei...@ugent.be wrote: hi all, we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes. (however, it's also not very clear if the recommendations are still valid for yarn) in particluar, we see way more map tasks then expected, and we are wondering how the number of map tasks per job run is determined. teragen created 64 output files, we are only expecting 64 map tasks, each processing one input file. however, we see something like 3000 tasks hints are much appreciated stijn
Re: Hadoop YARM Cluster Setup Questions
Hi, The requirement is simply to have the slaves and masters files on the resource manager it's used by the shell script that starts the demons :-) Sent from my iPhone On 23 Aug 2014, at 16:02, S.L simpleliving...@gmail.com wrote: Ok, Ill copy the slaves file to the other slave nodes as well. What about the masters file though? Sent from my HTC - Reply message - From: rab ra rab...@gmail.com To: user@hadoop.apache.org user@hadoop.apache.org Subject: Hadoop YARM Cluster Setup Questions Date: Sat, Aug 23, 2014 5:03 AM Hi, 1. Typically,we used to copy the slaves file all the participating nodes though I do not have concrete theory to back up this. Atleast, this is what I was doing in hadoop 1.2 and I am doing the same in hadoop 2x 2. I think, you should investigate the yarn GUI and see how many maps it has spanned. There is a high possibility that both the maps are running in the same node in parallel. Since there are two splits, there would be two map processes, and one node is capable of handling more than one map. 3. There could be no replica of input file stored and it is small, and hence stored in a single block in one node itself. These could be few hints which might help you regards rab On Sat, Aug 23, 2014 at 12:26 PM, S.L simpleliving...@gmail.com wrote: Hi Folks, I was not able to find a clear answer to this , I know that on the master node we need to have a slaves file listing all the slaves , but do we need to have the slave nodes have a master file listing the single name node( I am not using a secondary name node). I only have the slaves file on the master node. I was not able to find a clear answer to this ,the reason I ask this is because when I submit a hadoop job , even though the input is being split into 2 parts , only one data node is assigned applications , the other two ( I have three) are no tbeing assigned any applications. Thanks in advance!
Re: Hadoop 2.2 Built-in Counters
Hi, This is the content of my shell script for running the job history server: cd $HADOOP_PREFIX hadoop fs -mkdir -p /mr-history/tmp hadoop fs -chmod -R 1777 /mr-history/tmp hadoop fs -mkdir -p /mr-history/done hadoop fs -chmod -R 1777 /mr-history/done sbin/mr-jobhistory-daemon.sh start historyserver These configurable variables are in mapred-site.xml property namemapreduce.jobhistory.address/name value137.195.143.129:10020/value descriptionDefault port is 10020./description /property property namemapreduce.jobhistory.webapp.address/name value137.195.143.129:19888/value descriptionDefault port is 19888./description /property I start the history server on the same node as my resource manager The counters are available from when the job is running from: http://your-server:8088/proxy/application_1408007466921_0002/mapreduce/job/ job_1408007466921_0002 Drill down through the application master to the job. If you don¹t have the history server running the job data is not persistent. Hope this helps. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: ou senshaw sens...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 14 August 2014 07:14 To: user@hadoop.apache.org Subject: Hadoop 2.2 Built-in Counters Hi all, I'm trying to analyze my mapreduce job performance via built-in counters such as physical memory usage, heap memory usage... When the job is running, I can watch these counters via Resource manager website(namenode:8088). However, when the job is done, counter information is not available in resource manager website anymore. I know I can get them from client output. I was wondering if there is other place in name node or data node to get the final counter measures regarding job id? Thanks, Shaw
Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?
Hi, I have been using Hadoop since Christmas loosely and from May for an Software engineering MSc at Heriot Watt University in Edinburgh, Scotland. I have written a genetic sequence alignment algorithm. I have installed Hadoop in various places including a 32 node cluster and am using eclipse kepler sr 2 as an IDE. My current Hadoop version is 2.4.1 which I download as a tar from the apache mirror servers. It¹s been a tough learning curve, but that has made the learning all the more valuable. I believe using the straight Hadoop version has given insights that proprietary builds wouldn¹t have. There are so many confusing issues that crop up, it¹s easy to attach importance to trying to fix the an error which masks another. With the proprietary versions it would be easy to attach blame where it¹s not that build or this builds fault. Go with your heart but be prepared to work to solve the problems you encounter. Buy Tom Whites book, it isn¹t perfect and a couple of years out of date but it gives you enough detail and structure to build an impression you can work from. The downloadable source code is a great help when trying to get started. Good luck. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Adaryl \Bob\ Wakefield, MBA adaryl.wakefi...@hotmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 14 August 2014 01:13 To: user@hadoop.apache.org Subject: Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode? He didn¹t ask for the best and nobody framed up their answer like that. He asked what people were using. Out of the 10 responses only four of them actually answered his question. I¹ve been studying Hadoop for two months straight. Quite frankly, I wish more people would ask for community input and what does what and how. Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData From: Kilaru, Sambaiah mailto:sambaiah_kil...@intuit.com Sent: Wednesday, August 13, 2014 1:10 PM To: user@hadoop.apache.org Subject: Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode? Engough wars on going on which is best. You choose one of it and try to learn and there is nothing that x is better or y is better. It is upto your choice. Thanks, Sam From: Sebastiano Di Paola sebastiano.dipa...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Wednesday, August 13, 2014 at 6:28 PM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode? Hi, I'm a newbie too and I'm not using any particular distribution. Just download the component I need / want to try for my deploiment and use them. It's a slow process but allows me to better understand what I'm doing under the hood. Regards, Seba On Tue, Aug 12, 2014 at 10:12 PM, mani kandan mankand...@gmail.com wrote: Which distribution are you people using? Cloudera vs Hortonworks vs Biginsights?
Re: Can anyone help me resolve this Error: unable to create new native thread
Hi Ravi, I resolved this. Many thanks. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Ravi Prakash ravi...@ymail.com Reply-To: user@hadoop.apache.org Date: Friday, 15 August 2014 01:31 To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: Can anyone help me resolve this Error: unable to create new native thread Hi Chris! When is this error caused? Which logs do you see this in? Are you sure you are setting the ulimit for the correct user? What application are you trying to run which is causing you to run up against this limit? HTH Ravi On Saturday, August 9, 2014 6:07 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, I¹ve scrabbled around looking for a fix for a while and have set the soft ulimit size to 13172. I¹m using Hadoop 2.4.1 Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: ulimit for Hive
Hi Zhijie, ulimit is common between hard and soft ulimit The hard limit can only be set by a sys admin. It can be used for a fork bomb dos attack. The sys admin hard ulimit can be set per user i.e hadoop_user A user can add a line to their .profile file setting a soft -ulimit up to the hard limit. You can google how to do that You can check the ulimits like so: ulimit -H -a // hard limit ulimit -S -a // soft limit The max value for the hard limit is -unlimited. I currently have mine set to this as I was running out of processes (nproc) I don¹t know about restarting, I think so. I don¹t know about hive. Warm regards. Chris telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://twitter.com/#!/MacKenzieStudio http://www.linkedin.com/in/chrismackenziephotography/ From: Zhijie Shen zs...@hortonworks.com Reply-To: user@hadoop.apache.org Date: Tuesday, 12 August 2014 18:33 To: user@hadoop.apache.org, u...@hive.apache.org Subject: Re: ulimit for Hive + Hive user mailing list It should be a better place for your questions. On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi, I¹ve been reading a lot of posts about needing to set a high ulimit for file descriptors in Hadoop and I think it¹s probably the cause of a lot of the errors I¹ve been having when trying to run queries on larger data sets in Hive. However, I¹m really confused about how and where to set the limit, so I have a number of questions: 1. How high is it recommended to set the ulimit? 2. What is the difference between soft and hard limits? Which one needs to be set to the value from question 1? 3. For which user(s) do I set the ulimit? If I am running the Hive query with my login, do I set my own ulimit to the high value? 4. Do I need to set this limit for these users on all the machines in the cluster? (we have one master node and 6 slave nodes) 5. Do I need to restart anything after configuring the ulimit? Thanks in advance, Ana -- Zhijie ShenHortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Can anyone help me resolve this Error: unable to create new native thread
Hi, I¹ve scrabbled around looking for a fix for a while and have set the soft ulimit size to 13172. I¹m using Hadoop 2.4.1 Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: Can anyone tell me the current typical memory specification, switch size and disk space
Thanks Adaryl, I’m currently looking at Tom White p298, published May 2012, which references a 2010 spec. Both Tom and Eric's books where published in 2012 so the information in both will be a tad dated no doubt. What I need to know is the current: Processor average spec Memory spec Disk storage spec Network speed. Can you help me out with that ? Thanks in advance, Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ On 01/08/2014 17:28, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: The book Hadoop Operations by Eric Sammer helped answer a lot of these questions for me. Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba -Original Message- From: Chris MacKenzie Sent: Friday, August 01, 2014 4:35 AM To: user@hadoop.apache.org Subject: Can anyone tell me the current typical memory specification, switch size and disk space Hi, I¹d really appreciate it if someone could let me know the current preferred specification for a cluster set up. On average how many nodes Disk space Memory Switch size A link to a paper or discussion would be much appreciated. Thanks in advance Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Can anyone tell me the current typical memory specification, switch size and disk space
Hi, I¹d really appreciate it if someone could let me know the current preferred specification for a cluster set up. On average how many nodes Disk space Memory Switch size A link to a paper or discussion would be much appreciated. Thanks in advance Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: Cannot compaile a basic PutMerge.java program
Hi, I can probably help you out with that. I don¹t want to sound patronising though. What is your IDE and have you included the hadoop libraries in your jar ? Regards, Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: R J rj201...@yahoo.com Reply-To: user@hadoop.apache.org Date: Monday, 28 July 2014 01:46 To: user@hadoop.apache.org user@hadoop.apache.org Subject: Cannot compaile a basic PutMerge.java program Hi All, I am new to programming on hadoop. I tried to compile the following program (example program from a hadoop book) on my linix server where I have Haddop installed: I get the errors: $javac PutMerge.java PutMerge.java:2: package org.apache.hadoop.conf does not exist import org.apache.hadoop.conf.Configuration; ^ PutMerge.java:3: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.FSDataInputStream; ^ PutMerge.java:4: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.FSDataOutputStream; ^ PutMerge.java:5: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.FileStatus; ^ PutMerge.java:6: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.FileSystem; ^ PutMerge.java:7: package org.apache.hadoop.fs does not exist import org.apache.hadoop.fs.Path; I have $HADOOP_HOME set u: $echo $HADOOP_HOME /usr/lib/hadoop Could you please suggest how to compile this program? Thanks a lot. Shu PutMerge.java= import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class PutMerge { public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem hdfs = FileSystem.get(conf); FileSystem local = FileSystem.getLocal(conf); Path inputDir = new Path(args[0]); Path hdfsFile = new Path(args[1]); try { FileStatus[] inputFiles = local.listStatus(inputDir); FSDataOutputStream out = hdfs.create(hdfsFile); for (int i=0; iinputFiles.length; i++) { System.out.println(inputFiles[i].getPath().getName()); FSDataInputStream in = local.open(inputFiles[i].getPath()); byte buffer[] = new byte[256]; int bytesRead = 0; while( (bytesRead = in.read(buffer)) 0) { out.write(buffer, 0, bytesRead); } in.close(); } out.close(); } catch (IOException e) { e.printStackTrace(); } } } =
Re: How to set up the conf folder
Hi Ravindra, Thanks for replying, it’s much appreciated. That’s always been the case with my setup: export HADOOP_PREFIX=/scratch/extra/cm469/hadoop-2.4.1 export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop I think my issue is that I have not set yarn-env.sh up correctly. TBH I didn’t know it existed. I plan to get round to trying that at some point in the near future. Many thanks, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Ravindra ravin.i...@gmail.com Reply-To: user@hadoop.apache.org Date: Monday, 28 July 2014 13:09 To: user@hadoop.apache.org Subject: Re: How to set up the conf folder Hi, Could you try putting this in .bash_profile export HADOOP_CONF_DIR=/scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/ Regards, Ravindra On Wed, Jul 23, 2014 at 3:17 PM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, Can anyone shed some light on this for me. Every time I attempt to set up the conf directory, I run into a whole load of errors and ssh issues which I don¹t see when my config files are in etc/hadoop I want to understand how to use the conf directory. My ultimate goal is to use symbolic links I have a running cluster based version of hadoop-2.4.1. I start and stop the cluster from the RM with: ./hadoop-2.4.1/sbin/start-dfs.sh ./hadoop-2.4.1/sbin/start-yarn.sh My understanding is that to use the conf directory my settings should be was follows: Settings: bash-profile has $HADOOP_CONF_DIR = /scratch/extra/cm469/hadoop-2.4.1/etc/hadoop Hadoop-env.sh has export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²} conf/core-site.xml conf/hdfs-site.xml conf/mapred-site.xml conf/yarn-site.xml conf/capacity-scheduler.xml etc/hadoop/hadoop-env.sh etc/hadoop/slaves etc/hadoop/masters Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: Is it a good idea to delete / move the default configuration xml file ?
Hi thanks for that, much appreciated. I guess they are in the jar files then ;O) I was really surprised to see the default configs pulled in, especially considering I thought I was in full control, I did a file search on an installation and saw the files and jumped to the wrong conclusion. I feel like a real idiot some times, but there is so much conflicting information out there that later you realise that questions asked seem non sensical but at the time they feel valid ;O) Thanks for your tolerance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ On 21/07/2014 09:46, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi All, I have just realised that my implementation of hadoop-2.4.1 is pulling in all the default.xml files. I have three copies of each in different directories, obviously at least one of those is on the class path. Anyway with all the effort to set up a site, it seems strange to me that I would use settings I had no idea existed and that may not be how I would choose to set them up. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
How to set up the conf folder
Hi, Can anyone shed some light on this for me. Every time I attempt to set up the conf directory, I run into a whole load of errors and ssh issues which I don¹t see when my config files are in etc/hadoop I want to understand how to use the conf directory. My ultimate goal is to use symbolic links I have a running cluster based version of hadoop-2.4.1. I start and stop the cluster from the RM with: ./hadoop-2.4.1/sbin/start-dfs.sh ./hadoop-2.4.1/sbin/start-yarn.sh My understanding is that to use the conf directory my settings should be was follows: Settings: bash-profile has $HADOOP_CONF_DIR = /scratch/extra/cm469/hadoop-2.4.1/etc/hadoop Hadoop-env.sh has export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²} conf/core-site.xml conf/hdfs-site.xml conf/mapred-site.xml conf/yarn-site.xml conf/capacity-scheduler.xml etc/hadoop/hadoop-env.sh etc/hadoop/slaves etc/hadoop/masters Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Amended question - How to set up the conf folder
My $HADOOP_CONF_DIR = /scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/conf Hi, Can anyone shed some light on this for me. Every time I attempt to set up the conf directory, I run into a whole load of errors and ssh issues which I don¹t see when my config files are in etc/hadoop I want to understand how to use the conf directory. My ultimate goal is to use symbolic links I have a running cluster based version of hadoop-2.4.1. I start and stop the cluster from the RM with: ./hadoop-2.4.1/sbin/start-dfs.sh ./hadoop-2.4.1/sbin/start-yarn.sh My understanding is that to use the conf directory my settings should be was follows: Settings: bash-profile has $HADOOP_CONF_DIR = /scratch/extra/cm469/hadoop-2.4.1/etc/hadoop/conf Hadoop-env.sh has export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf²} conf/core-site.xml conf/hdfs-site.xml conf/mapred-site.xml conf/yarn-site.xml conf/capacity-scheduler.xml etc/hadoop/hadoop-env.sh etc/hadoop/slaves etc/hadoop/masters Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: Configuration set up questions - Container killed on request. Exit code is 143
Thanks Ozawa Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/Expert in all aspects of photography telephone: 0131 332 6967 tel:0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ weddings: www.wedding.chrismackenziephotography.co.uk http://www.wedding.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://twitter.com/#!/MacKenzieStudio http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250 http://www.linkedin.com/in/chrismackenziephotography/ http://pinterest.com/ChrisMacKenzieP/ On 18/07/2014 18:07, Tsuyoshi OZAWA” wrote: Hi Chris MacKenzie, How about trying as follows to identify the reason of your problem? 1. Making both yarn.nodemanager.pmem-check-enabled and yarn.nodemanager.vmem-check-enabled false 2. Making yarn.nodemanager.pmem-check-enabled true 3. Making yarn.nodemanager.pmem-check-enabled true and yarn.nodemanager.vmem-pmem-ratio large value(e.g. 100) 4. Making yarn.nodemanager.pmem-check-enabled true and yarn.nodemanager.vmem-pmem-ratio expected value(e.g. 2.1 or something) If there is problem on 1, the reason may be JVM configuration problem or another issue. If there is problem on 2, the reason is shortage of physical memory. Thanks, - Tsuyoshi On Fri, Jul 18, 2014 at 6:52 PM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi Guys, Thanks very much for getting back to me. Thanks Chris - the idea of slitting the data is a great suggestion. Yes Wangda, I was restarting after changing the configs I’ve been checking the relationship between what I thought was in my config files and what hadoop thought were in them. With: // Print out Config file settings for testing. for (EntryString, String entry: conf){ System.out.printf(%s=%s\n, entry.getKey(), entry.getValue()); } There were anomalies ;0( Now that my hadoop reflects the values that are in my config files - I just get the message “Killed” without any explanation. Unfortunately, where I was applying changes incrementally and testing I’ve applied all the changes all at once. I’m now backing out the changes I made slowly to see where it starts to reflect what I expect. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 17 July 2014 16:15 To: user@hadoop.apache.org Subject: Re: Configuration set up questions - Container killed on request. Exit code is 143 Another thing to try is smaller input splits if your data can be broken up into smaller files that can be independently processed. That way s you get more but smaller map tasks. You could also use more but smaller reducers. The many files will tax your NameNode more but you might get to use all you cores. On Jul 17, 2014 9:07 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi Chris, Thanks for getting back to me. I will set that value to 10 I have just tried this. https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN- Me mory-Parameters Setting both to mapreduce.map.memory.mb mapreduce.reduce.memory.mb. Though after setting it I didn’t get the expected change. As the output was still 2.1 GB of 2.1 GB virtual memory used. Killing container Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 17 July 2014 13:36 To: Chris MacKenzie stu...@chrismackenziephotography.co.uk Cc: user@hadoop.apache.org Subject: Re: Configuration set up questions - Container killed on request. Exit code is 143 Hi Chris MacKenzie, I have a feeling (I am not familiar with the kind of work you are doing) that your application is memory intensive. 8 cores per node and only 12GB is tight. Try bumping up the yarn.nodemanager.vmem-pmem-ratio Chris Mawata On Wed, Jul 16, 2014 at 11:37 PM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, Thanks Chris Mawata I’m working through this myself, but wondered if anyone could point me in the right direction. I have attached my configs. I’m using hadoop 2.41 My system is: 32 Clusters 8 processors per machine 12 gb ram Available disk
Is it a good idea to delete / move the default configuration xml file ?
Hi All, I have just realised that my implementation of hadoop-2.4.1 is pulling in all the default.xml files. I have three copies of each in different directories, obviously at least one of those is on the class path. Anyway with all the effort to set up a site, it seems strange to me that I would use settings I had no idea existed and that may not be how I would choose to set them up. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Re: Configuration set up questions - Container killed on request. Exit code is 143
Hi Guys, Thanks very much for getting back to me. Thanks Chris - the idea of slitting the data is a great suggestion. Yes Wangda, I was restarting after changing the configs I’ve been checking the relationship between what I thought was in my config files and what hadoop thought were in them. With: // Print out Config file settings for testing. for (EntryString, String entry: conf){ System.out.printf(%s=%s\n, entry.getKey(), entry.getValue()); } There were anomalies ;0( Now that my hadoop reflects the values that are in my config files - I just get the message “Killed” without any explanation. Unfortunately, where I was applying changes incrementally and testing I’ve applied all the changes all at once. I’m now backing out the changes I made slowly to see where it starts to reflect what I expect. Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 17 July 2014 16:15 To: user@hadoop.apache.org Subject: Re: Configuration set up questions - Container killed on request. Exit code is 143 Another thing to try is smaller input splits if your data can be broken up into smaller files that can be independently processed. That way s you get more but smaller map tasks. You could also use more but smaller reducers. The many files will tax your NameNode more but you might get to use all you cores. On Jul 17, 2014 9:07 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi Chris, Thanks for getting back to me. I will set that value to 10 I have just tried this. https://support.gopivotal.com/hc/en-us/articles/201462036-Mapreduce-YARN-Me mory-Parameters Setting both to mapreduce.map.memory.mb mapreduce.reduce.memory.mb. Though after setting it I didn’t get the expected change. As the output was still 2.1 GB of 2.1 GB virtual memory used. Killing container Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 17 July 2014 13:36 To: Chris MacKenzie stu...@chrismackenziephotography.co.uk Cc: user@hadoop.apache.org Subject: Re: Configuration set up questions - Container killed on request. Exit code is 143 Hi Chris MacKenzie, I have a feeling (I am not familiar with the kind of work you are doing) that your application is memory intensive. 8 cores per node and only 12GB is tight. Try bumping up the yarn.nodemanager.vmem-pmem-ratio Chris Mawata On Wed, Jul 16, 2014 at 11:37 PM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, Thanks Chris Mawata I’m working through this myself, but wondered if anyone could point me in the right direction. I have attached my configs. I’m using hadoop 2.41 My system is: 32 Clusters 8 processors per machine 12 gb ram Available disk space per node 890 gb This is my current error: mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id : attempt_1405538067846_0006_r_00_1, Status : FAILED Container [pid=25848,containerID=container_1405538067846_0006_01_04] is running beyond virtual memory limits. Current usage: 439.0 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1405538067846_0006_01_04 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 25853 25848 25848 25848 (java) 2262 193 2268090368 112050 /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog s/application_1405538067846_0006/container_1405538067846_0006_01_04 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056 attempt_1405538067846_0006_r_00_1 4 |- 25848 25423 25848 25848 (bash) 0 0 108613632 333 /bin/bash -c /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap plication_1405538067846_0006
Configuration set up questions - Container killed on request. Exit code is 143
Hi, Thanks Chris Mawata I’m working through this myself, but wondered if anyone could point me in the right direction. I have attached my configs. I’m using hadoop 2.41 My system is: 32 Clusters 8 processors per machine 12 gb ram Available disk space per node 890 gb This is my current error: mapreduce.Job (Job.java:printTaskEvents(1441)) - Task Id : attempt_1405538067846_0006_r_00_1, Status : FAILED Container [pid=25848,containerID=container_1405538067846_0006_01_04] is running beyond virtual memory limits. Current usage: 439.0 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1405538067846_0006_01_04 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 25853 25848 25848 25848 (java) 2262 193 2268090368 112050 /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog s/application_1405538067846_0006/container_1405538067846_0006_01_04 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056 attempt_1405538067846_0006_r_00_1 4 |- 25848 25423 25848 25848 (bash) 0 0 108613632 333 /bin/bash -c /usr/java/latest//bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m -Djava.io.tmpdir=/tmp/hadoop-cm469/nm-local-dir/usercache/cm469/appcache/ap plication_1405538067846_0006/container_1405538067846_0006_01_04/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/scratch/extra/cm469/hadoop-2.4.1/logs/userlog s/application_1405538067846_0006/container_1405538067846_0006_01_04 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 137.195.143.103 59056 attempt_1405538067846_0006_r_00_1 4 1/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846 _0006/container_1405538067846_0006_01_04/stdout 2/scratch/extra/cm469/hadoop-2.4.1/logs/userlogs/application_1405538067846 _0006/container_1405538067846_0006_01_04/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Regards, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, 17 July 2014 02:10 To: user@hadoop.apache.org Subject: Re: Can someone shed some light on this ? - java.io.IOException: Spill failed I would post the configuration files -- easier for someone to spot something wrong than to imagine what configuration would get you to that stacktrace. The part Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1405523201400_0006_m_00_0_spill_8.out would suggest you might not have hadoop.tmp.dir set (?) On Wed, Jul 16, 2014 at 1:02 PM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, Is this a coding or a setup issue ? I¹m using Hadoop 2.41 My program is doing a concordance on 500,000 sequences of 400 chars. My cluster set is 32 data nodes and two masters. The exact error is: Error: java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTas k.java:1535) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1062) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInput OutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapp er.java:112) at par.gene.align.v3.concordance.ConcordanceMapper.map(ConcordanceMapper.java: 96) at par.gene.align.v3.concordance.ConcordanceMapper.map(ConcordanceMapper.java: 1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:1556
ControlledJob.java:submit Job in state RUNNING instead of DEFINE - can someone shed some light on this error for me ;O)
Hi, I¹m using Controlledjob and my code is: ControlledJob doConcordance = new ControlledJob( this.doParallelConcordance(), null); .... control.addJob(doConcordance); control.addJob(viableSubequenceMaxLength); control.addJob(viableSubSequences); control.addJob(actualCriticalSubsequence); control.addJob(generatePins); control.addJob(deriveLeftPinMaxLength); control.addJob(doAlignment); control.addJob(doConcatenation); When it comes to an end I have: jobcontrol.ControlledJob (ControlledJob.java:submit(338)) - Concordance Phase got an error while submitting java.lang.IllegalStateException: Job in state RUNNING instead of DEFINE Thanks in advance, Chris MacKenzie telephone: 0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
Thank you And What advice would you give me on running my first Hadoop cluster based Job
Hi, Over the past two weeks, from a standing start, I¹ve worked on a Hadoop based parallel genetic sequence alignment algorithm as part of my university masters project. Thankfully that¹s now up and running, along the way I got some great help from members of this group and I deeply appreciate that strangers would take time out of their busy lives to shed a bit of light on what seemed at times an insurmountable task. On Monday I get to play with a 32 node system and the only advice I have so far is to benchmark my algorithm with 5gb per node. I wonder if, if you were starting out again on your first big Hadoop map reduce job what would would you differently ? What advice would you give me starting out ? Thanks again, I really appreciate your support. Best Chris Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/ http://www.chrismackenziephotography.co.uk/Expert http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://www.linkedin.com/in/chrismackenziephotography/
What is the correct way to get a string back from a mapper or reducer
Hi, I have the following code and am using hadoop 2.4: In my driver: Configuration conf = new Configuration(); conf.set(sub, help); .. String s = conf.get(sub²); In my reducer: Configuration conf = context.getConfiguration(); conf.set(sub, Test²); When I test the value in the driver, it isn¹t updated following the reduce Best, Chris MacKenzie http://www.chrismackenziephotography.co.uk/ http://www.chrismackenziephotography.co.uk/Expert
job.setOutputFormatClass(NullOutputFormat.class);
Hi, What is the anticipated usage of the above with the new api ? Is there another way to remove the empty part-r files When using it with MultipleOutputs to remove empty part-r files I have no output ;O) Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/
Re: job.setOutputFormatClass(NullOutputFormat.class);
Hi Markus And Shahab, Thanks for getting back to me, I really appreciate it. LazyOutputFormat did the trick. I tried NUllOutputFormat (job.setOutputFormatClass(NullOutputFormat.class);) before writing to the group but was getting an empty folder. I looked at LazyOutputFormat, in fact, my mos is written from: http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapreduce/lib/out put/MultipleOutputs.html Just couldn¹t see the wood for the trees ;O) Best, Chris
Re: Partitioning and setup errors
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceReducer not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895) at org.apache.hadoop.mapreduce.task.JobContextImpl.getReducerClass(JobContextIm pl.java:210) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:611) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceReducer not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893) ... 8 more 14/06/29 13:17:37 INFO mapreduce.Job: map 0% reduce 100% 14/06/29 13:17:37 INFO mapreduce.Job: Job job_1403953980618_0009 failed with state FAILED due to: Task failed task_1403953980618_0009_r_00 Job failed as tasks failed. failedMaps:0 failedReduces:1 14/06/29 13:18:04 INFO mapreduce.Job: Counters: 7 Job Counters Failed reduce tasks=4 Launched reduce tasks=4 Total time spent by all maps in occupied slots (ms)=0 Total time spent by all reduces in occupied slots (ms)=7786 Total time spent by all reduce tasks (ms)=7786 Total vcore-seconds taken by all reduce tasks=7786 Total megabyte-seconds taken by all reduce tasks=7972864 14/06/29 13:18:05 INFO ipc.Client: Retrying connect to server: admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 14/06/29 13:18:06 INFO ipc.Client: Retrying connect to server: admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 14/06/29 13:18:07 INFO ipc.Client: Retrying connect to server: admins-MacBook-Pro.local/192.168.0.5:53193. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 14/06/29 13:18:07 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server End time = 1404044288031 Elapsed time = 161318 ms Finished /usr/local/hadoop-2.4.0/jar_files $ Warm regards. Chris From: Vinod Kumar Vavilapalli vino...@hortonworks.com Reply-To: user@hadoop.apache.org Date: Sunday, 29 June 2014 00:20 To: user@hadoop.apache.org Subject: Re: Partitioning and setup errors What is happening is the client is not able to pick up the right jar to push to the cluster. It looks in the class-path for the jar that contains the class ParallelGeneticAlignment. How are you packaging your code? How are your running your job - paste the command line? +Vinod On Jun 27, 2014, at 5:15 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, I realise my previous question may have been a bit naïve and I also realise I am asking an awful lot here, any advice would be greatly appreciated. * I have been using Hadoop 2.4 in local mode and am sticking to the mapreduce.* side of the track. * I am using a Custom Line reader to read each sequence into a Map * I have a partitioner class which is testing the key from the map class. * I've tried debugging in eclipse with a breakpoint in the partitioner class but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks) is not being called. Could there be any reason for that ? Because my map and reduce code works in local mode within eclipse, I wondered if I may get the partitioner to work if I changed to Pseudo Distributed Mode exporting a runnable jar from Eclipse (Kepler) I have several faults On my own computer Pseudo Distributed Mode and the university clusters Pseudo Distributed Mode which I set up. I¹ve googled and read extensively but am not seeing a solution to any of these issues. I have this line: 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). My driver code is: private void doParallelConcordance() throws Exception { Path inDir = new Path(input_sequences/10_sequences.txt); Path outDir = new Path(demo_output); Job job = Job.getInstance(new Configuration()); job.setJarByClass(ParallelGeneticAlignment.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(CustomFileInputFormat.class); job.setMapperClass(ConcordanceMapper.class); job.setPartitionerClass(ConcordanceSequencePartitioner.class); job.setReducerClass(ConcordanceReducer.class); FileInputFormat.addInputPath(job
Re: Partitioning and setup errors
HI Chris, I¹m away from my books for the weekend. Is that call (extends Configurable implements Tool) a Hadoop 2 call. Would that mean that I am better sticking with Hadoop 1.x ? Warm regards. Chris http://www.chrismackenziephotography.co.uk/ Expert in all aspects of photography telephone: 0131 332 6967 tel:0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk http://www.chrismackenziephotography.co.uk/ http://plus.google.com/+ChrismackenziephotographyCoUk/posts http://twitter.com/#!/MacKenzieStudio http://www.facebook.com/pages/Chris-MacKenzie-Photography/145946284250 http://www.linkedin.com/in/chrismackenziephotography/ http://pinterest.com/ChrisMacKenzieP/ From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Friday, 27 June 2014 23:46 To: user@hadoop.apache.org Subject: Re: Partitioning and setup errors Probably my fault. I was looking for the extends Configurable implements Tool part. I will double check when I get home rather than send you on a wild goose chase. Cheers Chris On Jun 27, 2014 8:16 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, I realise my previous question may have been a bit naïve and I also realise I am asking an awful lot here, any advice would be greatly appreciated. * I have been using Hadoop 2.4 in local mode and am sticking to the mapreduce.* side of the track. * I am using a Custom Line reader to read each sequence into a Map * I have a partitioner class which is testing the key from the map class. * I've tried debugging in eclipse with a breakpoint in the partitioner class but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks) is not being called. Could there be any reason for that ? Because my map and reduce code works in local mode within eclipse, I wondered if I may get the partitioner to work if I changed to Pseudo Distributed Mode exporting a runnable jar from Eclipse (Kepler) I have several faults On my own computer Pseudo Distributed Mode and the university clusters Pseudo Distributed Mode which I set up. I¹ve googled and read extensively but am not seeing a solution to any of these issues. I have this line: 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). My driver code is: private void doParallelConcordance() throws Exception { Path inDir = new Path(input_sequences/10_sequences.txt); Path outDir = new Path(demo_output); Job job = Job.getInstance(new Configuration()); job.setJarByClass(ParallelGeneticAlignment.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(CustomFileInputFormat.class); job.setMapperClass(ConcordanceMapper.class); job.setPartitionerClass(ConcordanceSequencePartitioner.class); job.setReducerClass(ConcordanceReducer.class); FileInputFormat.addInputPath(job, inDir); FileOutputFormat.setOutputPath(job, outDir); job.waitForCompletion(true) } On the university server I am getting this error: 4/06/27 11:45:40 INFO mapreduce.Job: Task Id : attempt_1403860966764_0003_m_00_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceMapper not found On my machine the error is: 4/06/27 12:58:03 INFO mapreduce.Job: Task Id : attempt_1403864060032_0004_r_00_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceReducer not found On the university server I get total paths to process: 14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process : 1 14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1 On my machine I get total paths to process: 14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process : 0 14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0 Being new to this community, I thought it polite to introduce myself. I¹m planning to return to software development via an MSc at Heriot Watt University in Edinburgh. My MSc project is based on Fosters Genetic Sequence Alignment. I have written a sequential version my goal is now to port it to Hadoop. Thanks in advance, Regards, Chris MacKenzie
Re: Partitioning and setup errors
HI Chris, Thanks for your response. I deeply appreciate it. I don¹t know what you mean by that question. I use configuration: * In the driver Job job = Job.getInstance(new Configuration()); * In the CustomLineRecordReader Configuration job = context.getConfiguration(); One of the biggest issues I have had is staying true to the mapreduce.* format Best wishes, Chris MacKenzie From: Chris Mawata chris.maw...@gmail.com Reply-To: user@hadoop.apache.org Date: Friday, 27 June 2014 14:11 To: user@hadoop.apache.org Subject: Re: Partitioning and setup errors The new Configuration() is suspicious. Are you setting configuration information manually? Chris On Jun 27, 2014 5:16 AM, Chris MacKenzie stu...@chrismackenziephotography.co.uk wrote: Hi, I realise my previous question may have been a bit naïve and I also realise I am asking an awful lot here, any advice would be greatly appreciated. * I have been using Hadoop 2.4 in local mode and am sticking to the mapreduce.* side of the track. * I am using a Custom Line reader to read each sequence into a Map * I have a partitioner class which is testing the key from the map class. * I've tried debugging in eclipse with a breakpoint in the partitioner class but getPartition(LongWritable mapKey, Text sequenceString, int numReduceTasks) is not being called. Could there be any reason for that ? Because my map and reduce code works in local mode within eclipse, I wondered if I may get the partitioner to work if I changed to Pseudo Distributed Mode exporting a runnable jar from Eclipse (Kepler) I have several faults On my own computer Pseudo Distributed Mode and the university clusters Pseudo Distributed Mode which I set up. I¹ve googled and read extensively but am not seeing a solution to any of these issues. I have this line: 14/06/27 11:45:27 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). My driver code is: private void doParallelConcordance() throws Exception { Path inDir = new Path(input_sequences/10_sequences.txt); Path outDir = new Path(demo_output); Job job = Job.getInstance(new Configuration()); job.setJarByClass(ParallelGeneticAlignment.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(CustomFileInputFormat.class); job.setMapperClass(ConcordanceMapper.class); job.setPartitionerClass(ConcordanceSequencePartitioner.class); job.setReducerClass(ConcordanceReducer.class); FileInputFormat.addInputPath(job, inDir); FileOutputFormat.setOutputPath(job, outDir); job.waitForCompletion(true) } On the university server I am getting this error: 4/06/27 11:45:40 INFO mapreduce.Job: Task Id : attempt_1403860966764_0003_m_00_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceMapper not found On my machine the error is: 4/06/27 12:58:03 INFO mapreduce.Job: Task Id : attempt_1403864060032_0004_r_00_2, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class par.gene.align.concordance.ConcordanceReducer not found On the university server I get total paths to process: 14/06/27 11:45:27 INFO input.FileInputFormat: Total input paths to process : 1 14/06/27 11:45:28 INFO mapreduce.JobSubmitter: number of splits:1 On my machine I get total paths to process: 14/06/27 12:57:09 INFO input.FileInputFormat: Total input paths to process : 0 14/06/27 12:57:36 INFO mapreduce.JobSubmitter: number of splits:0 Being new to this community, I thought it polite to introduce myself. I¹m planning to return to software development via an MSc at Heriot Watt University in Edinburgh. My MSc project is based on Fosters Genetic Sequence Alignment. I have written a sequential version my goal is now to port it to Hadoop. Thanks in advance, Regards, Chris MacKenzie
Splitting map and reduce
Hi, This is my first mail to this user group. I hope that the email is well formed and enables me to learn a great deal about Hadoop. I have to carry out sequence alignment using Hadoop with the aid of a critical subsequence. A potential critical subsequence is derived from the longest unique subsequence in a sequence. A valid critical subsequence must exist in one or more sequences. I have run a concordance with a line splitter and have able to get Map input records=10² Which leads me to believe I can assess each sequence independently. I was hoping to get 10 reduce outputs but I only got one. Is there a way to split the reduce output in the same way that I can split the map input. Thanks in advance, Regards, Chris MacKenzie