Re: The name of the current input file during a map
-mapred.input.file +map.input.file Should work Amogh On 11/26/09 12:57 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello again, I'm using Hadoop 0.21 and its context object e.g public void setup(Context context) { Configuration cfg = context.getConfiguration(); System.out.println(mapred.input.file=+cfg.get(mapred.input.file)); displays null, so maybe this fell out by mistake in the api change? Regards Saptarshi On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Thank you. Regards Saptarshi On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar am...@yahoo-inc.com wrote: Conf.get(map.input.file) is what you need. Amogh On 11/26/09 12:35 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, I have a set of input files part-r-* which I will pass through another map(no reduce). the part-r-* files consist of key, values, keys being small, values fairly large(MB's) I would like to index these, i.e run a map, whose output is key and /filename/ i.e to which part-r-* file the particular key belongs, so that if i need them again I can just access that file. Q: In the map stage,how do I retrieve the name of the file being processed? I'd rather not use the MapFileOutputFormat. Hadoop 0.21 Regards Saptarshi
Processing 10MB files in Hadoop
Hi list. I have small files containing data that has to be processed. A file can be small, even down to 10MB (but it can me also 100-600MB large) and contains at least 3 records to be processed. Processing one record can take 30 seconds to 2 minutes. My cluster is about 10 nodes. Each node has 16 cores. Anybody can give an idea about how to deal with these small files? It is not quite a common Hadoop task; I know. For example, how many map tasks should I set in this case?
Re: Processing 10MB files in Hadoop
On Thu, Nov 26, 2009 at 5:32 PM, Cubic cubicdes...@gmail.com wrote: Hi list. I have small files containing data that has to be processed. A file can be small, even down to 10MB (but it can me also 100-600MB large) and contains at least 3 records to be processed. Processing one record can take 30 seconds to 2 minutes. My cluster is about 10 nodes. Each node has 16 cores. Sorry for deviating from the question , but curious to know what does core here refer to ? Anybody can give an idea about how to deal with these small files? It is not quite a common Hadoop task; I know. For example, how many map tasks should I set in this case? -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him
Good idea to run NameNode and JobTracker on same machine?
Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.
Re: Good idea to run NameNode and JobTracker on same machine?
It depends on the size of your cluster. I think you can combine them together if your cluster has less than 10 machines. Jeff Zhang On Thu, Nov 26, 2009 at 6:26 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.
KeyValueTextInputFormat and Hadoop 0.20.1
Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),Myjob); job.setInputFormatClass(KeyValueTextInputFormat.class); The last line throws the following error: The method setInputFormatClass(Class? extends InputFormat) in the type Job is not applicable for the arguments (ClassKeyValueTextInputFormat) Job.setInputFormatClass expects a subclass of the new class org.apache.hadoop.mapreduce.InputFormat. But KeyValueTextInputFormat is only available as subclass of the deprecated org.apache.hadoop.mapred.FileInputFormat. Is there a way to use KeyValueTextInputFormat with the new classes Job and Configuration? Thanks, Matthias
Re: Processing 10MB files in Hadoop
The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (size of file)/(size of block) ) Hi Do you mean, I should set the number of map tasks to 1 I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days.
Re: Processing 10MB files in Hadoop
Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote: The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (size of file)/(size of block) ) Hi Do you mean, I should set the number of map tasks to 1 I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days.
Re: Processing 10MB files in Hadoop
But the documentation DO recommend to set it: http://wiki.apache.org/hadoop/HowManyMapsAndReduces PS: I am using streaming Jeff Zhang wrote: Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote: The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (size of file)/(size of block) ) Hi Do you mean, I should set the number of map tasks to 1 I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days.
Re: Processing 10MB files in Hadoop
Quote from the wiki doc *The number of map tasks can also be increased manually using the JobConfhttp://wiki.apache.org/hadoop/JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.* So the number of map task is determited by InputFormat. But you can manually set the number of reducer task to improve the performance, because the default number of reducer task is 1 Jeff Zhang On Thu, Nov 26, 2009 at 7:58 AM, CubicDesign cubicdes...@gmail.com wrote: But the documentation DO recommend to set it: http://wiki.apache.org/hadoop/HowManyMapsAndReduces PS: I am using streaming Jeff Zhang wrote: Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote: The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (size of file)/(size of block) ) Hi Do you mean, I should set the number of map tasks to 1 I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days.
AW: KeyValueTextInputFormat and Hadoop 0.20.1
Sorry, but I can't find it in the version control system for release 0.20.1: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/ Du you have another distribution? Regards, Matthias -Ursprüngliche Nachricht- Von: Jeff Zhang [mailto:zjf...@gmail.com] Gesendet: Donnerstag, 26. November 2009 16:35 An: common-user@hadoop.apache.org Betreff: Re: KeyValueTextInputFormat and Hadoop 0.20.1 There's a KeyValueInputFormat under package org.apache.hadoop.mapreduce.lib.input which is for hadoop new API Jeff Zhang On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),Myjob); job.setInputFormatClass(KeyValueTextInputFormat.class); The last line throws the following error: The method setInputFormatClass(Class? extends InputFormat) in the type Job is not applicable for the arguments (ClassKeyValueTextInputFormat) Job.setInputFormatClass expects a subclass of the new class org.apache.hadoop.mapreduce.InputFormat. But KeyValueTextInputFormat is only available as subclass of the deprecated org.apache.hadoop.mapred.FileInputFormat. Is there a way to use KeyValueTextInputFormat with the new classes Job and Configuration? Thanks, Matthias
Re: The name of the current input file during a map
On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote: I'm using Hadoop 0.21 and its context object In the new API you can re-write that as: ((FIleSplit) context.getInputSplit()).getPath() -- Owen
Re: KeyValueTextInputFormat and Hadoop 0.20.1
It's in trunk, maybe this is not added in hadoop 0.20.1 On Thu, Nov 26, 2009 at 8:13 AM, Matthias Scherer matthias.sche...@1und1.de wrote: Sorry, but I can't find it in the version control system for release 0.20.1: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/ Du you have another distribution? Regards, Matthias -Ursprüngliche Nachricht- Von: Jeff Zhang [mailto:zjf...@gmail.com] Gesendet: Donnerstag, 26. November 2009 16:35 An: common-user@hadoop.apache.org Betreff: Re: KeyValueTextInputFormat and Hadoop 0.20.1 There's a KeyValueInputFormat under package org.apache.hadoop.mapreduce.lib.input which is for hadoop new API Jeff Zhang On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi, I started my first experimental Hadoop project with Hadoop 0.20.1 an run in the following problem: Job job = new Job(new Configuration(),Myjob); job.setInputFormatClass(KeyValueTextInputFormat.class); The last line throws the following error: The method setInputFormatClass(Class? extends InputFormat) in the type Job is not applicable for the arguments (ClassKeyValueTextInputFormat) Job.setInputFormatClass expects a subclass of the new class org.apache.hadoop.mapreduce.InputFormat. But KeyValueTextInputFormat is only available as subclass of the deprecated org.apache.hadoop.mapred.FileInputFormat. Is there a way to use KeyValueTextInputFormat with the new classes Job and Configuration? Thanks, Matthias
Re: Processing 10MB files in Hadoop
Are the record processing steps bound by a local machine resource - cpu, disk io or other? What I often do when I have lots of small files to handle is use the NlineInputFormat, as data locality for the input files is a much lessor issue than short task run times in that case, Each line of my input file would be one of the small files, and then I would set the number of files per split to be some reasonable number. If the individual record processing is not bound by local resources you may wish to try the MultithreadedMapRunner, which gives you a lot of flexibily about the number of map executions you run in parallel without needing to restart your cluster to change the tasks per tracker. On Thu, Nov 26, 2009 at 8:05 AM, Jeff Zhang zjf...@gmail.com wrote: Quote from the wiki doc *The number of map tasks can also be increased manually using the JobConfhttp://wiki.apache.org/hadoop/JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.* So the number of map task is determited by InputFormat. But you can manually set the number of reducer task to improve the performance, because the default number of reducer task is 1 Jeff Zhang On Thu, Nov 26, 2009 at 7:58 AM, CubicDesign cubicdes...@gmail.com wrote: But the documentation DO recommend to set it: http://wiki.apache.org/hadoop/HowManyMapsAndReduces PS: I am using streaming Jeff Zhang wrote: Actually, you do not need to set the number of map task, the InputFormat will compute it for you according your input data set. Jeff Zhang On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote: The number of mapper is determined by your InputFormat. In common case, if file is smaller than one block size (which is 64M by default), one mapper for this file. if file is larger than one block size, hadoop will split this large file, and the number of mapper for this file will be ceiling ( (size of file)/(size of block) ) Hi Do you mean, I should set the number of map tasks to 1 I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days. -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Processing 10MB files in Hadoop
Try CombineFileInputFormat. Thanks Yongqiang On 11/26/09 4:02 AM, Cubic cubicdes...@gmail.com wrote: i list. I have small files containing data that has to be processed. A file can be small, even down to 10MB (but it can me also 100-600MB large) and contains at least 3 records to be processed. Processing one record can take 30 seconds to 2 minutes. My cluster is about 10 nodes. Each node has 16 cores. Anybody can give an idea about how to deal with these small files? It is not quite a common Hadoop task; I know. For example, how many map tasks should I set in this case?
Re: Good idea to run NameNode and JobTracker on same machine?
I think it is definitely not a good idea to combine these two in production environment. Thanks Yongqiang On 11/26/09 6:26 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.
Hadoop 0.20 map/reduce Failing for old API
Hi, We've recently upgraded to hadoop 0.20. Writing to HDFS seems to be working fine, but the map/reduce jobs are failing with the following exception. Note, we have not moved to the new map/reduce API yet. In the client that launches the job, the only change I have made is to now load the three files; core-site.xml, hdfs-site.xml and mapred-site.xml rather than the hadoop-site.xml. Any ideas? INFO | jvm 1| 2009/11/26 13:47:26 | 2009-11-26 13:47:26,328 INFO [FileInputFormat] Total input paths to process : 711 INFO | jvm 1| 2009/11/26 13:47:28 | 2009-11-26 13:47:28,033 INFO [JobClient] Running job: job_200911241319_0003 INFO | jvm 1| 2009/11/26 13:47:29 | 2009-11-26 13:47:29,036 INFO [JobClient] map 0% reduce 0% INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,068 INFO [JobClient] Task Id : attempt_200911241319_0003_m_03_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:47:36 | java.io.IOException: Task process exit with nonzero status of 1. INFO | jvm 1| 2009/11/26 13:47:36 | at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) INFO | jvm 1| 2009/11/26 13:47:36 | INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,094 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_03_0filter=stdout INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,096 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_03_0filter=stderr INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,162 INFO [JobClient] Task Id : attempt_200911241319_0003_m_00_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:47:51 | java.io.IOException: Task process exit with nonzero status of 1. INFO | jvm 1| 2009/11/26 13:47:51 | at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) INFO | jvm 1| 2009/11/26 13:47:51 | INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,166 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_00_0filter=stdout INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,167 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_00_0filter=stderr INFO | jvm 1| 2009/11/26 13:47:52 | 2009-11-26 13:47:52,173 INFO [JobClient] map 50% reduce 0% INFO | jvm 1| 2009/11/26 13:48:03 | 2009-11-26 13:48:03,219 INFO [JobClient] Task Id : attempt_200911241319_0003_m_01_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:48:03 | Map output lost, rescheduling: getMapOutput(attempt_200911241319_0003_m_01_0,0) failed : INFO | jvm 1| 2009/11/26 13:48:03 | org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200911241319_0003/attempt_200911241319_0003_m_0 1_0/output/file.out.index in any of the configured local directories INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathT oRead(LocalDirAllocator.java:389) INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAlloca tor.java:138) INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker. java:2886) INFO | jvm 1| 2009/11/26 13:48:03 | at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) INFO | jvm 1| 2009/11/26 13:48:03 | at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 16) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:230) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.Server.handle(Server.java:324) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
Re: Processing 10MB files in Hadoop
Are the record processing steps bound by a local machine resource - cpu, disk io or other? Some disk I/O. Not so much compared with the CPU. Basically it is a CPU bound. This is why each machine has 16 cores. What I often do when I have lots of small files to handle is use the NlineInputFormat, Each file contains a complete/independent set of records. I cannot mix the data resulted from processing two different files. - Ok. I think I need to re-explain my problem :) While running jobs on these small files, the computation time was almost 5 times longer than expected. It looks like the job was affected by the number of map task that I have (100). I don't know which are the best parameters in my case (10MB files). I have zero reduce tasks.
Re: Good idea to run NameNode and JobTracker on same machine?
I have a cluster of 4 machines plus one machine to run nn jt. I have heard that 5 or 6 is the magic #. I will see when I add the next batch of machines. And it seems to running fine. -Jogn On Nov 26, 2009, at 11:38 AM, Yongqiang He heyongqiang...@gmail.com wrote: I think it is definitely not a good idea to combine these two in production environment. Thanks Yongqiang On 11/26/09 6:26 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Do people normally combine these two processes onto one machine? Currently I have them on separate machines but I am wondering they use that much CPU processing time and maybe I should combine them and create another DataNode.
log files on the cluster?
Hi, it is probably described somewhere in the manuals, but 1. Where are the log files, especially those that show my System.out.println() and errors; and 2. Do I need to log in to every machine on the cluster? Thank you, Mark
Re: log files on the cluster?
On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, it is probably described somewhere in the manuals, but 1. Where are the log files, especially those that show my System.out.println() and errors; and Look at the logs directory ... 2. Do I need to log in to every machine on the cluster? Try the WEB UI interface though i am not sure Thank you, Mark -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him
Re: Hadoop 0.20 map/reduce Failing for old API
The exit status of 1 usually indicates configuration issues, incorrect command invocation in hadoop 0.20 (incorrect params), if not JVM crash. In your logs there is no indication of crash, but some paths/command can be the cause. Can you check if your lib paths/data paths are correct? If it is a memory intensive task, you may also try values on mapred.child.java.opts /mapred.job.map.memory.mb.Thanks! On 11/27/09 1:28 AM, Arv Mistry a...@kindsight.net wrote: Hi, We've recently upgraded to hadoop 0.20. Writing to HDFS seems to be working fine, but the map/reduce jobs are failing with the following exception. Note, we have not moved to the new map/reduce API yet. In the client that launches the job, the only change I have made is to now load the three files; core-site.xml, hdfs-site.xml and mapred-site.xml rather than the hadoop-site.xml. Any ideas? INFO | jvm 1| 2009/11/26 13:47:26 | 2009-11-26 13:47:26,328 INFO [FileInputFormat] Total input paths to process : 711 INFO | jvm 1| 2009/11/26 13:47:28 | 2009-11-26 13:47:28,033 INFO [JobClient] Running job: job_200911241319_0003 INFO | jvm 1| 2009/11/26 13:47:29 | 2009-11-26 13:47:29,036 INFO [JobClient] map 0% reduce 0% INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,068 INFO [JobClient] Task Id : attempt_200911241319_0003_m_03_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:47:36 | java.io.IOException: Task process exit with nonzero status of 1. INFO | jvm 1| 2009/11/26 13:47:36 | at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) INFO | jvm 1| 2009/11/26 13:47:36 | INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,094 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_03_0filter=stdout INFO | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,096 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_03_0filter=stderr INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,162 INFO [JobClient] Task Id : attempt_200911241319_0003_m_00_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:47:51 | java.io.IOException: Task process exit with nonzero status of 1. INFO | jvm 1| 2009/11/26 13:47:51 | at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) INFO | jvm 1| 2009/11/26 13:47:51 | INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,166 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_00_0filter=stdout INFO | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,167 WARN [JobClient] Error reading task outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski d=attempt_200911241319_0003_m_00_0filter=stderr INFO | jvm 1| 2009/11/26 13:47:52 | 2009-11-26 13:47:52,173 INFO [JobClient] map 50% reduce 0% INFO | jvm 1| 2009/11/26 13:48:03 | 2009-11-26 13:48:03,219 INFO [JobClient] Task Id : attempt_200911241319_0003_m_01_0, Status : FAILED INFO | jvm 1| 2009/11/26 13:48:03 | Map output lost, rescheduling: getMapOutput(attempt_200911241319_0003_m_01_0,0) failed : INFO | jvm 1| 2009/11/26 13:48:03 | org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200911241319_0003/attempt_200911241319_0003_m_0 1_0/output/file.out.index in any of the configured local directories INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathT oRead(LocalDirAllocator.java:389) INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAlloca tor.java:138) INFO | jvm 1| 2009/11/26 13:48:03 | at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker. java:2886) INFO | jvm 1| 2009/11/26 13:48:03 | at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) INFO | jvm 1| 2009/11/26 13:48:03 | at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 16) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) INFO | jvm 1| 2009/11/26 13:48:03 | at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) INFO | jvm 1|
Re: log files on the cluster?
Thank you, that pretty much does it, the logs on EC2 are in /mnt/hadoop/logs On Thu, Nov 26, 2009 at 10:43 PM, Siddu siddu.s...@gmail.com wrote: On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, it is probably described somewhere in the manuals, but 1. Where are the log files, especially those that show my System.out.println() and errors; and Look at the logs directory ... 2. Do I need to log in to every machine on the cluster? Try the WEB UI interface though i am not sure Thank you, Mark -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him
Re: please help in setting hadoop
Hi, Just a thought, but you do not need to setup the temp directory in conf/hadoop-site.xml especially if you are running basic examples. Give that a shot, maybe it will work out. Otherwise see if you can find additional info in the LOGS Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 12:20 AM , Krishna Kumar krishna.ku...@nechclst.in sent: Dear All, Can anybody please help me in getting out from these error messages: [ hadoop]# hadoop jar /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar wordcount test test-op 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730 ) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222) at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) I am running the hadoop cluster as root user on two server nodes: master and slave. My hadoop-site.xml file format is as follows : fs.default.name hdfs://master:54310 dfs.permissions false dfs.name.dir /home/hadoop/dfs/name Further the o/p of ls command is as follows: [ hadoop]# ls -l /home/hadoop/hadoop-root/ total 8 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred [ hadoop]# [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/ total 4 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/ total 0 Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECHCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECHCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: RE: please help in setting hadoop
Hi, There should be a folder called as logs in $HADOOP_HOME. Also try going through http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29. This is a pretty good tutorial Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 1:18 AM , Krishna Kumar krishna.ku...@nechclst.in sent: I have tried, but didn't get any success. In bwt can you please tell exact path of log file which I have to refer. Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. -Original Message- From: aa...@buffalo.edu [aa...@buffa lo.edu] Sent: Friday, November 27, 2009 10:56 AM To: common-user@hadoop.apache.org Subject: Re: please help in setting hadoop Hi, Just a thought, but you do not need to setup the temp directory in conf/hadoop-site.xml especially if you are running basic examples. Give that a shot, maybe it will work out. Otherwise see if you can find additional info in the LOGS Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 12:20 AM , Krishna Kumar kri shna.ku...@nechclst.in sent: Dear All, Can anybody please help me in getting out from these error messages: [ hadoop]# hadoop jar /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar wordcount test test-op 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to process : 4 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730 ) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222) at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav a:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) I am running the hadoop cluster as root user on two server nodes: master and slave. My hadoop-site.xml file format is as follows : fs.default.name hdfs://master:54310 dfs.permissions false dfs.name.dir /home/hadoop/dfs/name Further the o/p of ls command is as follows: [ hadoop]# ls -l /home/hadoop/hadoop-root/ total 8 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred [ hadoop]# [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/ total 4 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local [ hadoop]# [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/ total 0 Thanks and Best Regards, Krishna Kumar Senior Storage Engineer Why do we have to die? If we had to die, and everything is gone after that, then nothing else matters on this earth - everything is temporary, at least relative to me. DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NECHCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NECHCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It
Re: Doubt in Hadoop
Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM, aa...@buffalo.edu wrote: Hello Everybody, I have a doubt in Haddop and was wondering if anybody has faced a similar problem. I have a package called test. Inside that I have class called A.java, Map.java, Reduce.java. In A.java I have the main method where I am trying to initialize the jobConf object. I have written jobConf.setMapperClass(Map.class) and similarly for the reduce class as well. The code works correctly when I run the code locally via jobConf.set(mapred.job.tracker,local) but I get an exception when I try to run this code on my cluster. The stack trace of the exception is as under. I cannot understand the problem. Any help would be appreciated. java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: test.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752) at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338) at org.apache.hadoop.mapred.Child.main(Child.java:158) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Markowitz.covarMatrixMap at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744) ... 6 more Caused by: java.lang.ClassNotFoundException: test.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718) ... 7 more Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)
Re: Re: Doubt in Hadoop
Hi, I am running the job from command line. The job runs fine in the local mode but something happens when I try to run the job in the distributed mode. Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 2:31 AM , Jeff Zhang zjf...@gmail.com sent: Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM, wrote: Hello Everybody, I have a doubt in Haddop and was wondering if anybody has faced a similar problem. I have a package called test. Inside that I have class called A.java, Map.java, Reduce.java. In A.java I have the main method where I am trying to initialize the jobConf object. I have written jobConf.setMapperClass(Map.class) and similarly for the reduce class as well. The code works correctly when I run the code locally via jobConf.set(mapred.job.tracker,local) but I get an exception when I try to run this code on my cluster. The stack trace of the exception is as under. I cannot understand the problem. Any help would be appreciated. java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: test.Map at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752) at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338) at org.apache.hadoop.mapred.Child.main(Child.java:158) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Markowitz.covarMatrixMap at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744) ... 6 more Caused by: java.lang.ClassNotFoundException: test.Map at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718) ... 7 more Thank You Abhishek Agrawal SUNY- Buffalo (716-435-7122)