Re: Multiple data centre in Hadoop
Thanks Robert. Is there a best practice or design than can address the High Availability to certain extent? ~Abhishek On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans wrote: > No it does not. Sorry > > > On 4/11/12 1:44 PM, "Abhishek Pratap Singh" wrote: > > Hi All, > > Just wanted if hadoop supports more than one data centre. This is basically > for DR purposes and High Availability where one centre goes down other can > bring up. > > > Regards, > Abhishek > >
Re: Multiple data centre in Hadoop
No it does not. Sorry On 4/11/12 1:44 PM, "Abhishek Pratap Singh" wrote: Hi All, Just wanted if hadoop supports more than one data centre. This is basically for DR purposes and High Availability where one centre goes down other can bring up. Regards, Abhishek
Hadoop map task initialization takes too long (3 minutes, 10 seconds to be exact)
Greetings people, Well, lately, in any Hadoop flow I'm running, I encounter a 3 minutes, 10 second delay for a certain map node (master working as slave). After that initialization delay, it goes back to normal and executes instantly. For example, when running QuasiMonteCarlo example: Task Id Start Time Finish Time attempt_201204101957_0006_m_03_0 10/04 20:14:54 10/04 20:18:05 (3mins, 10sec) /default-rack/master 2012-04-10 20:18:04,470 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2012-04-10 20:18:04,646 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2012-04-10 20:18:04,647 WARN org.apache.hadoop.conf.Configuration: user.nameis deprecated. Instead, use mapreduce.job.user.name 2012-04-10 20:18:04,751 INFO org.apache.hadoop.mapreduce.util.ProcessTree: setsid exited with exit code 0 2012-04-10 20:18:04,754 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@79ee2c2c 2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100 2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: soft limit at 83886080 2012-04-10 20:18:04,912 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600 2012-04-10 20:18 :04,912 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600 2012-04-10 20:18:04,939 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: Spilling map output 2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600 2012-04-10 20:18:04,940 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600 2012-04-10 20:18:04,972 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2012-04-10 20:18:04,975 INFO org.apache.hadoop.mapred.Task: Task:attempt_201204101957_0006_m_03_0 is done. And is in the process of commiting 2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201204101957_0006_m_03_0' done. Task tracker log is more telling: 2012-04-10 *20:14:54,615* INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201204101957_0006_m_03_0 which needs 1 slots 2012-04-10 20:14:54,685 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201204101957_0006_m_377512887 spawned. 2012-04-10 20:16:34,041 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1 2012-04-10 *20:18:04,433* INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201204101957_0006_m_377512887 given task: attempt_201204101957_0006_m_03_0 2012-04-10 20:18:04,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204101957_0006_m_03_0 0.0% 2012-04-10 20:18:05,056 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204101957_0006_m_03_0 0.667% Generated 1000 samples. sort 2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201204101957_0006_m_03_0 is done. 2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201204101957_0006_m_03_0 was 28 2012-04-10 20:18:05,058 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2012-04-10 20:18:05,213 INFO org.apache.hadoop.mapreduce.util.ProcessTree: Sending signal to all members of process group -23030: SIGTERM. Exit code 1 2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 28 bytes to reduce 0 from map: attempt_201204101957_0006_m_03_0 given 28/24 2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker: Shuffled 1maps (mapIds=attempt_201204101957_0006_m_03_0) to reduce 0 in 29s 2012-04-10 20:18:08,478 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 147.102.7.173:50060, dest: 147.102.7.175:57289, maps: 1, op: MAPRED_SHUFFLE, reduceID: 0, duration: 29 2012-04-10 20:18:10,217 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201204101957_0006_m_377512887 exited with exit code 0. Number of tasks it ran: 1 I suspect a network issue here, but I can ping and ssh with no problem. Thank you in advance, Nikos Stasinopoulos
Re: Map Reduce Job Help
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ > > From: hellooperator >To: core-u...@hadoop.apache.org >Sent: Wednesday, April 11, 2012 11:15 AM >Subject: Map Reduce Job Help > > >Hello, > >I'm just starting out with Hadoop and writing some Map Reduce jobs. I was >looking for help on writing a MR job in python that allows me to take some >emails and put them into HDFS so I can search on the text or attachments of >the email? > >Thank you! >-- >View this message in context: >http://old.nabble.com/Map-Reduce-Job-Help-tp33670645p33670645.html >Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > >
Map Reduce Job Help
Hello, I'm just starting out with Hadoop and writing some Map Reduce jobs. I was looking for help on writing a MR job in python that allows me to take some emails and put them into HDFS so I can search on the text or attachments of the email? Thank you! -- View this message in context: http://old.nabble.com/Map-Reduce-Job-Help-tp33670645p33670645.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: How do I include the newer version of Commons-lang in my jar?
Have you tried setting 'mapreduce.user.classpath.first'? It allows user jars to be put in the classpath before hadoop jars. -Original Message- From: Sky USC Reply-To: "common-user@hadoop.apache.org" Date: Mon, 9 Apr 2012 15:46:52 -0500 To: "common-user@hadoop.apache.org" Subject: RE: How do I include the newer version of Commons-lang in my jar? > > > > >Thanks for the reply. I appreciate your helpfulness. I created Jars by >following instructions at >"http://blog.mafr.de/2010/07/24/maven-hadoop-job/";. So external Jars are >stored in lib/ folder within a jar. > >Am I summarizing this correctly: >1. If hadoop version = 0.20.203 or lower - then, there is not possible >for me to use an external jar such as "commons-lang" from apache in my >application. Any external jars packaged within my jar under "lib" >directory are not captured. This appears like a huge limitation to me? >2. If hadoop version > 0.20.204 to 1.0.x - then use >"HADOOP_USER_CLASSPATH_FIRST=true" environment variable before launching >"hadoop jar" might help. I tried this for version 0.20.205 but it didnt >work. >3. If hadoop version > 2.x or formerly 0.23.x - then this can be set via >API? > >Is there a working version of testable jar that has these dependencies >that I can try to figure out if its my way of packaging jar or something >else?? > >Thx > >> From: ha...@cloudera.com >> Date: Mon, 9 Apr 2012 13:50:37 +0530 >> Subject: Re: How do I include the newer version of Commons-lang in my >>jar? >> To: common-user@hadoop.apache.org >> >> Answer is a bit messy. >> >> Perhaps you can set the environment variable "export >> HADOOP_USER_CLASSPATH_FIRST=true" before you do a "hadoop jar Š" to >> launch your job. However, although this approach is present in >> 0.20.204+ (0.20.205, and 1.0.x), am not sure if it makes an impact on >> the tasks as well. I don't see it changing anything but for the driver >> CP. I've not tested it - please let us know if it works in your >> environment. >> >> In higher versions (2.x or formerly 0.23.x), this is doable from >> within your job if you set "mapreduce.job.user.classpath.first" to >> true inside your job, and ship your replacement jars along. >> >> Some versions would also let you set this via >> "JobConf/Job.setUserClassesTakesPrecedence(true/false)" API calls. >> >> On Mon, Apr 9, 2012 at 11:14 AM, Sky wrote: >> > Hi. >> > >> > I am new to Hadoop and I am working on project on AWS Elastic >>MapReduce. >> > >> > The problem I am facing is: >> > * org.apache.commons.lang.time.DateUtils: parseDate() works OK but >> > parseDateStrictly() fails. >> > I think parseDateStrictly might be new in lang 2.5. I thought I >>included all >> > dependencies. However, for some reason, during runtime, my app is not >> > picking up the newer commons-lang. >> > >> > Would love some help. >> > >> > Thx >> > - sky >> > >> > >> >> >> >> -- >> Harsh J > >
Re: Hadoopp_ClassPath issue.
Dharin, I believe the properties you are looking for are the following: HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user suggested classpath to the beginning of global classpath. So, you would have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more information. If you are on 1.0 (or 0.20), refer to hadoop script. Now, if you want to run an M/R job by passing your own jar and you want that jar to be used first, you want to set the config parameter 'mapreduce.job.user.classpath.first' and then the user provided jar will be put in before $HADOOP_CLASSPATH. Hope this makes sense. Also, these will work on 1.0 (or 0.23) above. Refer: https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23) https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20) Thanks, John George -Original Message- From: dmaniar Reply-To: "common-user@hadoop.apache.org" Date: Tue, 10 Apr 2012 21:09:10 -0700 To: "core-u...@hadoop.apache.org" Subject: Hadoopp_ClassPath issue. > >Hi, > >I am new to hadoop and its not very familiar with internal working. I had >some questions about HADOOP_CLASSPATH. > >We are currently suppose to use a Hadoop cluster with 4 machines and its >HADOOP_CLASSPATH in hadoop-env.sh is as below. >export >HADOOP_CLASSPATH="/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH" > >Now my, >/home/user/app/www/WEB-INF/classes has a class called Application.class > >From a remote machine I submit a map-reduce job to this cluster, with a >jar >called MyJar.jar. [This has a Application.class too, but with some >modifications] > >When the TaskTracker spawns a child Java process for the Mapper the >classpath I see is as below in that order, > >Lets say my hadoop is installed at: /home/user/hadoop/ >/home/user/hadoop/jar1, >/home/user/hadoop/jar2, >. >. >. >/home/user/hadoop/jarN, >/home/user/hadoop/lib/jar1, >/home/user/hadoop/lib/jar2, >/home/user/hadoop/lib/jarN, >1./home/user/app/www/WEB-INF/classes, >2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar >[note:- basically this has the modified class that I need to use for my >Map-Reduce job] > >Well its clear from this classpath that i will end up using the >Application.class from the classes folder. with gives me incorrect >results. > >Now my Question is, how do I make sure i reverse the order of 1 & 2. > >Some pointer that I found was, >1) if MyJar.jar is not changing much then I can put in a shared location >and >modify my hadoop-env.sh to >export >HADOOP_CLASSPATH="/some/share/location/lib:/home/user/app/www/WEB-INF/clas >ses:$HADOOP_CLASSPATH" > >2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh > >3) is there any property taht suggest to add before classpath ? > >Any help is greatly appreciated. > >To Summarize, >If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add >application jar before this classpath. > >Again. I saw the DistributedCache.java [hadoop src] and the code looks >like. > >public static void addFileToClassPath(Path file, Configuration conf) > throws IOException { > String classpath = conf.get("mapred.job.classpath.files"); > conf.set("mapred.job.classpath.files", classpath == null ? file > .toString() : classpath + System.getProperty("path.separator") > + file.toString()); > . >} > >basically new files are added to the end of existing classpath. > > >Thanks, >Dharin. > > > > >-- >View this message in context: >http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html >Sent from the Hadoop core-user mailing list archive at Nabble.com. >
Re: Hadoopp_ClassPath issue.
Dharin, I believe the properties you are looking for are the following: HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user suggested classpath to the beginning of global classpath. So, you would have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more information. If you are on 1.0 (or 0.20), refer to hadoop script. Now, if you want to run an M/R job by passing your own jar and you want that jar to be used first, you want to set the config parameter 'mapreduce.job.user.classpath.first' and then the user provided jar will be put in before $HADOOP_CLASSPATH. Hope this makes sense. Also, these will work on 1.0 (or 0.23) above. Refer: https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23) https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20) Thanks, John George -Original Message- From: dmaniar Reply-To: "common-user@hadoop.apache.org" Date: Tue, 10 Apr 2012 21:09:10 -0700 To: "core-u...@hadoop.apache.org" Subject: Hadoopp_ClassPath issue. > >Hi, > >I am new to hadoop and its not very familiar with internal working. I had >some questions about HADOOP_CLASSPATH. > >We are currently suppose to use a Hadoop cluster with 4 machines and its >HADOOP_CLASSPATH in hadoop-env.sh is as below. >export >HADOOP_CLASSPATH="/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH" > >Now my, >/home/user/app/www/WEB-INF/classes has a class called Application.class > >From a remote machine I submit a map-reduce job to this cluster, with a >jar >called MyJar.jar. [This has a Application.class too, but with some >modifications] > >When the TaskTracker spawns a child Java process for the Mapper the >classpath I see is as below in that order, > >Lets say my hadoop is installed at: /home/user/hadoop/ >/home/user/hadoop/jar1, >/home/user/hadoop/jar2, >. >. >. >/home/user/hadoop/jarN, >/home/user/hadoop/lib/jar1, >/home/user/hadoop/lib/jar2, >/home/user/hadoop/lib/jarN, >1./home/user/app/www/WEB-INF/classes, >2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar >[note:- basically this has the modified class that I need to use for my >Map-Reduce job] > >Well its clear from this classpath that i will end up using the >Application.class from the classes folder. with gives me incorrect >results. > >Now my Question is, how do I make sure i reverse the order of 1 & 2. > >Some pointer that I found was, >1) if MyJar.jar is not changing much then I can put in a shared location >and >modify my hadoop-env.sh to >export >HADOOP_CLASSPATH="/some/share/location/lib:/home/user/app/www/WEB-INF/clas >ses:$HADOOP_CLASSPATH" > >2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh > >3) is there any property taht suggest to add before classpath ? > >Any help is greatly appreciated. > >To Summarize, >If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add >application jar before this classpath. > >Again. I saw the DistributedCache.java [hadoop src] and the code looks >like. > >public static void addFileToClassPath(Path file, Configuration conf) > throws IOException { > String classpath = conf.get("mapred.job.classpath.files"); > conf.set("mapred.job.classpath.files", classpath == null ? file > .toString() : classpath + System.getProperty("path.separator") > + file.toString()); > . >} > >basically new files are added to the end of existing classpath. > > >Thanks, >Dharin. > > > > >-- >View this message in context: >http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html >Sent from the Hadoop core-user mailing list archive at Nabble.com. >
Testing Map reduce code
Hi, I am working on a Hadoop project where I want to make automated build to run M/R test cases on real hadoop cluster. As of now it seems we can only unit test M/R through MiniDFSCluster /MiniMRCluster/MRUnit. None of this runs the test cases on Hadoop cluster. Is any other framework or any other way to make test cases to run on Hadoop cluster?? Thanks in Advance -- https://github.com/zinnia-phatak-dev/Nectar