Re: The name of the current input file during a map

2009-11-26 Thread Amogh Vasekar
-mapred.input.file
+map.input.file
Should work

Amogh

On 11/26/09 12:57 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote:

Hello again,
I'm using Hadoop 0.21 and its context object  e.g

 public void setup(Context context) {
Configuration cfg = context.getConfiguration();
System.out.println(mapred.input.file=+cfg.get(mapred.input.file));

displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi


On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
saptarshi.g...@gmail.com wrote:
 Thank you.
 Regards
 Saptarshi

 On Thu, Nov 26, 2009 at 2:10 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
 Conf.get(map.input.file) is what you need.

 Amogh


 On 11/26/09 12:35 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote:

 Hello,
 I have a set of input files part-r-* which I will pass through another
 map(no reduce).  the part-r-* files consist of key, values, keys being
 small, values fairly large(MB's)

 I would like to index these, i.e run a map, whose output is key and
 /filename/ i.e to which part-r-* file the particular key belongs, so
 that if i need them again I can just access that file.

 Q: In the map stage,how do I retrieve the name of the file being
 processed?  I'd rather not use the MapFileOutputFormat.

 Hadoop 0.21

 Regards
 Saptarshi






Processing 10MB files in Hadoop

2009-11-26 Thread Cubic
Hi list.

I have small files containing data that has to be processed. A file
can be small, even down to 10MB (but it can me also 100-600MB large)
and contains at least 3 records to be processed.
Processing one record can take 30 seconds to 2 minutes. My cluster is
about 10 nodes. Each node has 16 cores.

Anybody can give an idea about how to deal with these small files? It
is not quite a common Hadoop task; I know. For example, how many map
tasks should I set in this case?


Re: Processing 10MB files in Hadoop

2009-11-26 Thread Siddu
On Thu, Nov 26, 2009 at 5:32 PM, Cubic cubicdes...@gmail.com wrote:

 Hi list.

 I have small files containing data that has to be processed. A file
 can be small, even down to 10MB (but it can me also 100-600MB large)
 and contains at least 3 records to be processed.
 Processing one record can take 30 seconds to 2 minutes. My cluster is
 about 10 nodes. Each node has 16 cores.

Sorry for deviating from the question  , but curious to know what does core
here refer to ?


 Anybody can give an idea about how to deal with these small files? It
 is not quite a common Hadoop task; I know. For example, how many map
 tasks should I set in this case?




-- 
Regards,
~Sid~
I have never met a man so ignorant that i couldn't learn something from him


Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Raymond Jennings III
Do people normally combine these two processes onto one machine?  Currently I 
have them on separate machines but I am wondering they use that much CPU 
processing time and maybe I should combine them and create another DataNode.


  


Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Jeff Zhang
It depends on the size of your cluster. I think you can combine them
together if your cluster has less than 10 machines.


Jeff Zhang




On Thu, Nov 26, 2009 at 6:26 AM, Raymond Jennings III raymondj...@yahoo.com
 wrote:

 Do people normally combine these two processes onto one machine?  Currently
 I have them on separate machines but I am wondering they use that much CPU
 processing time and maybe I should combine them and create another DataNode.






KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Hi,

I started my first experimental Hadoop project with Hadoop 0.20.1 an run
in the following problem:

Job job = new Job(new Configuration(),Myjob);
job.setInputFormatClass(KeyValueTextInputFormat.class);

The last line throws the following error: The method
setInputFormatClass(Class? extends InputFormat) in the type Job is not
applicable for the arguments (ClassKeyValueTextInputFormat)

Job.setInputFormatClass expects a subclass of the new class
org.apache.hadoop.mapreduce.InputFormat. But KeyValueTextInputFormat is
only available as subclass of the deprecated
org.apache.hadoop.mapred.FileInputFormat.

Is there a way to use KeyValueTextInputFormat with the new classes Job
and Configuration?

Thanks,
Matthias


Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign



The number of mapper is determined by your InputFormat.

In common case, if file is smaller than one block size (which is 64M by
default), one mapper for this file. if file is larger than one block size,
hadoop will split this large file, and the number of mapper for this file
will be ceiling ( (size of file)/(size of block) )

  

Hi

Do you mean, I should set the number of map tasks to 1 
I want to process this file not in a single node but over the entire 
cluster. I need a lot of processing power in order to finish the job in 
hours instead of days.


Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jeff Zhang
Actually, you do not need to set the number of map task, the InputFormat
will compute it for you according your input data set.

Jeff Zhang


On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote:


  The number of mapper is determined by your InputFormat.

 In common case, if file is smaller than one block size (which is 64M by
 default), one mapper for this file. if file is larger than one block size,
 hadoop will split this large file, and the number of mapper for this file
 will be ceiling ( (size of file)/(size of block) )



 Hi

 Do you mean, I should set the number of map tasks to 1 
 I want to process this file not in a single node but over the entire
 cluster. I need a lot of processing power in order to finish the job in
 hours instead of days.



Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign
But the documentation DO recommend to set it: 
http://wiki.apache.org/hadoop/HowManyMapsAndReduces




PS: I am using streaming



Jeff Zhang wrote:

Actually, you do not need to set the number of map task, the InputFormat
will compute it for you according your input data set.

Jeff Zhang


On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com wrote:

  

 The number of mapper is determined by your InputFormat.


In common case, if file is smaller than one block size (which is 64M by
default), one mapper for this file. if file is larger than one block size,
hadoop will split this large file, and the number of mapper for this file
will be ceiling ( (size of file)/(size of block) )



  

Hi

Do you mean, I should set the number of map tasks to 1 
I want to process this file not in a single node but over the entire
cluster. I need a lot of processing power in order to finish the job in
hours instead of days.




  


Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jeff Zhang
Quote from the wiki doc

*The number of map tasks can also be increased manually using the
JobConfhttp://wiki.apache.org/hadoop/JobConf's
conf.setNumMapTasks(int num). This can be used to increase the number of map
tasks, but will not set the number below that which Hadoop determines via
splitting the input data.*

So the number of map task is determited by InputFormat.
But you can manually set the number of reducer task to improve the
performance, because the default number of reducer task is 1


Jeff Zhang

On Thu, Nov 26, 2009 at 7:58 AM, CubicDesign cubicdes...@gmail.com wrote:

 But the documentation DO recommend to set it:
 http://wiki.apache.org/hadoop/HowManyMapsAndReduces



 PS: I am using streaming




 Jeff Zhang wrote:

 Actually, you do not need to set the number of map task, the InputFormat
 will compute it for you according your input data set.

 Jeff Zhang


 On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com
 wrote:



  The number of mapper is determined by your InputFormat.


 In common case, if file is smaller than one block size (which is 64M by
 default), one mapper for this file. if file is larger than one block
 size,
 hadoop will split this large file, and the number of mapper for this
 file
 will be ceiling ( (size of file)/(size of block) )





 Hi

 Do you mean, I should set the number of map tasks to 1 
 I want to process this file not in a single node but over the entire
 cluster. I need a lot of processing power in order to finish the job in
 hours instead of days.









AW: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Matthias Scherer
Sorry, but I can't find it in the version control system for release 0.20.1: 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/

Du you have another distribution?

Regards,
Matthias
 

 -Ursprüngliche Nachricht-
 Von: Jeff Zhang [mailto:zjf...@gmail.com] 
 Gesendet: Donnerstag, 26. November 2009 16:35
 An: common-user@hadoop.apache.org
 Betreff: Re: KeyValueTextInputFormat and Hadoop 0.20.1
 
 There's a KeyValueInputFormat under package 
 org.apache.hadoop.mapreduce.lib.input
 which is for hadoop new API
 
 
 Jeff Zhang
 
 
 On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer 
 matthias.sche...@1und1.de
  wrote:
 
  Hi,
 
  I started my first experimental Hadoop project with Hadoop 
 0.20.1 an 
  run in the following problem:
 
  Job job = new Job(new Configuration(),Myjob); 
  job.setInputFormatClass(KeyValueTextInputFormat.class);
 
  The last line throws the following error: The method 
  setInputFormatClass(Class? extends InputFormat) in the 
 type Job is 
  not applicable for the arguments (ClassKeyValueTextInputFormat)
 
  Job.setInputFormatClass expects a subclass of the new class 
  org.apache.hadoop.mapreduce.InputFormat. But 
 KeyValueTextInputFormat 
  is only available as subclass of the deprecated 
  org.apache.hadoop.mapred.FileInputFormat.
 
  Is there a way to use KeyValueTextInputFormat with the new 
 classes Job 
  and Configuration?
 
  Thanks,
  Matthias
 
 


Re: The name of the current input file during a map

2009-11-26 Thread Owen O'Malley


On Nov 25, 2009, at 11:27 PM, Saptarshi Guha wrote:


I'm using Hadoop 0.21 and its context object


In the new API you can re-write that as:

((FIleSplit) context.getInputSplit()).getPath()

-- Owen


Re: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-26 Thread Jeff Zhang
It's in trunk, maybe this is not added in hadoop 0.20.1



On Thu, Nov 26, 2009 at 8:13 AM, Matthias Scherer matthias.sche...@1und1.de
 wrote:

 Sorry, but I can't find it in the version control system for release
 0.20.1:
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/mapreduce/lib/input/

 Du you have another distribution?

 Regards,
 Matthias


  -Ursprüngliche Nachricht-
  Von: Jeff Zhang [mailto:zjf...@gmail.com]
  Gesendet: Donnerstag, 26. November 2009 16:35
  An: common-user@hadoop.apache.org
  Betreff: Re: KeyValueTextInputFormat and Hadoop 0.20.1
 
  There's a KeyValueInputFormat under package
  org.apache.hadoop.mapreduce.lib.input
  which is for hadoop new API
 
 
  Jeff Zhang
 
 
  On Thu, Nov 26, 2009 at 7:10 AM, Matthias Scherer
  matthias.sche...@1und1.de
   wrote:
 
   Hi,
  
   I started my first experimental Hadoop project with Hadoop
  0.20.1 an
   run in the following problem:
  
   Job job = new Job(new Configuration(),Myjob);
   job.setInputFormatClass(KeyValueTextInputFormat.class);
  
   The last line throws the following error: The method
   setInputFormatClass(Class? extends InputFormat) in the
  type Job is
   not applicable for the arguments (ClassKeyValueTextInputFormat)
  
   Job.setInputFormatClass expects a subclass of the new class
   org.apache.hadoop.mapreduce.InputFormat. But
  KeyValueTextInputFormat
   is only available as subclass of the deprecated
   org.apache.hadoop.mapred.FileInputFormat.
  
   Is there a way to use KeyValueTextInputFormat with the new
  classes Job
   and Configuration?
  
   Thanks,
   Matthias
  
 



Re: Processing 10MB files in Hadoop

2009-11-26 Thread Jason Venner
Are the record processing steps bound by a local machine resource - cpu,
disk io or other?

What I often do when I have lots of small files to handle is use the
NlineInputFormat, as data locality for the input files is a much lessor
issue than short task run times in that case,
Each line of my input file would be one of the small files, and then I would
set the number of files per split to be some reasonable number.

If the individual record processing is not bound by local resources you may
wish to try the MultithreadedMapRunner, which gives you a lot of flexibily
about the number of map executions you run in parallel without needing to
restart your cluster to change the tasks per tracker.


On Thu, Nov 26, 2009 at 8:05 AM, Jeff Zhang zjf...@gmail.com wrote:

 Quote from the wiki doc

 *The number of map tasks can also be increased manually using the
 JobConfhttp://wiki.apache.org/hadoop/JobConf's
 conf.setNumMapTasks(int num). This can be used to increase the number of
 map
 tasks, but will not set the number below that which Hadoop determines via
 splitting the input data.*

 So the number of map task is determited by InputFormat.
 But you can manually set the number of reducer task to improve the
 performance, because the default number of reducer task is 1


 Jeff Zhang

 On Thu, Nov 26, 2009 at 7:58 AM, CubicDesign cubicdes...@gmail.com
 wrote:

  But the documentation DO recommend to set it:
  http://wiki.apache.org/hadoop/HowManyMapsAndReduces
 
 
 
  PS: I am using streaming
 
 
 
 
  Jeff Zhang wrote:
 
  Actually, you do not need to set the number of map task, the InputFormat
  will compute it for you according your input data set.
 
  Jeff Zhang
 
 
  On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign cubicdes...@gmail.com
  wrote:
 
 
 
   The number of mapper is determined by your InputFormat.
 
 
  In common case, if file is smaller than one block size (which is 64M
 by
  default), one mapper for this file. if file is larger than one block
  size,
  hadoop will split this large file, and the number of mapper for this
  file
  will be ceiling ( (size of file)/(size of block) )
 
 
 
 
 
  Hi
 
  Do you mean, I should set the number of map tasks to 1 
  I want to process this file not in a single node but over the entire
  cluster. I need a lot of processing power in order to finish the job in
  hours instead of days.
 
 
 
 
 
 
 




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: Processing 10MB files in Hadoop

2009-11-26 Thread Yongqiang He
Try CombineFileInputFormat.

Thanks
Yongqiang
On 11/26/09 4:02 AM, Cubic cubicdes...@gmail.com wrote:

 i list.
 
 I have small files containing data that has to be processed. A file
 can be small, even down to 10MB (but it can me also 100-600MB large)
 and contains at least 3 records to be processed.
 Processing one record can take 30 seconds to 2 minutes. My cluster is
 about 10 nodes. Each node has 16 cores.
 
 Anybody can give an idea about how to deal with these small files? It
 is not quite a common Hadoop task; I know. For example, how many map
 tasks should I set in this case?
 
 




Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread Yongqiang He
I think it is definitely not a good idea to combine these two in production
environment.

Thanks
Yongqiang
On 11/26/09 6:26 AM, Raymond Jennings III raymondj...@yahoo.com wrote:

 Do people normally combine these two processes onto one machine?  Currently I
 have them on separate machines but I am wondering they use that much CPU
 processing time and maybe I should combine them and create another DataNode.
 
 
   
 
 




Hadoop 0.20 map/reduce Failing for old API

2009-11-26 Thread Arv Mistry
Hi,

We've recently upgraded to hadoop 0.20. Writing to HDFS seems to be
working fine, but the map/reduce jobs are failing with the following
exception. Note, we have not moved to the new map/reduce API yet. In the
client that launches the job, the only change I have made is to now load
the three files; core-site.xml, hdfs-site.xml and mapred-site.xml rather
than the hadoop-site.xml. Any ideas?

INFO   | jvm 1| 2009/11/26 13:47:26 | 2009-11-26 13:47:26,328 INFO
[FileInputFormat] Total input paths to process : 711
INFO   | jvm 1| 2009/11/26 13:47:28 | 2009-11-26 13:47:28,033 INFO
[JobClient] Running job: job_200911241319_0003
INFO   | jvm 1| 2009/11/26 13:47:29 | 2009-11-26 13:47:29,036 INFO
[JobClient]  map 0% reduce 0%
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,068 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_03_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:47:36 | java.io.IOException: Task
process exit with nonzero status of 1.
INFO   | jvm 1| 2009/11/26 13:47:36 |   at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO   | jvm 1| 2009/11/26 13:47:36 | 
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,094 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_03_0filter=stdout
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,096 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_03_0filter=stderr
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,162 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_00_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:47:51 | java.io.IOException: Task
process exit with nonzero status of 1.
INFO   | jvm 1| 2009/11/26 13:47:51 |   at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO   | jvm 1| 2009/11/26 13:47:51 | 
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,166 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_00_0filter=stdout
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,167 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_00_0filter=stderr
INFO   | jvm 1| 2009/11/26 13:47:52 | 2009-11-26 13:47:52,173 INFO
[JobClient]  map 50% reduce 0%
INFO   | jvm 1| 2009/11/26 13:48:03 | 2009-11-26 13:48:03,219 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_01_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:48:03 | Map output lost, rescheduling:
getMapOutput(attempt_200911241319_0003_m_01_0,0) failed :
INFO   | jvm 1| 2009/11/26 13:48:03 |
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200911241319_0003/attempt_200911241319_0003_m_0
1_0/output/file.out.index in any of the configured local directories
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathT
oRead(LocalDirAllocator.java:389)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAlloca
tor.java:138)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.
java:2886)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
16)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
Collection.java:230)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.Server.handle(Server.java:324)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)

Re: Processing 10MB files in Hadoop

2009-11-26 Thread CubicDesign




Are the record processing steps bound by a local machine resource - cpu,
disk io or other?
  
Some disk I/O. Not so much compared with the CPU. Basically it is a CPU 
bound. This is why each machine has 16 cores.

What I often do when I have lots of small files to handle is use the
NlineInputFormat,
Each file contains a complete/independent set of records. I cannot mix 
the data resulted from processing two different files.



-
Ok. I think I need to re-explain my problem :)
While running jobs on these small files, the computation time was almost 
5 times longer than expected. It looks like the job was affected by the 
number of map task that I have (100). I don't know which are the best 
parameters in my case (10MB files).


I have zero reduce tasks.




Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-26 Thread John Martyniak
I have a cluster of 4 machines plus one machine to run nn  jt.  I  
have heard that 5 or 6 is the magic #.  I will see when I add the next  
batch of machines.


And it seems to running fine.

-Jogn

On Nov 26, 2009, at 11:38 AM, Yongqiang He heyongqiang...@gmail.com  
wrote:


I think it is definitely not a good idea to combine these two in  
production

environment.

Thanks
Yongqiang
On 11/26/09 6:26 AM, Raymond Jennings III raymondj...@yahoo.com  
wrote:


Do people normally combine these two processes onto one machine?   
Currently I
have them on separate machines but I am wondering they use that  
much CPU
processing time and maybe I should combine them and create another  
DataNode.











log files on the cluster?

2009-11-26 Thread Mark Kerzner
Hi,

it is probably described somewhere in the manuals, but


   1. Where are the log files, especially those that show my
   System.out.println() and errors; and
   2. Do I need to log in to every machine on the cluster?

Thank you,
Mark


Re: log files on the cluster?

2009-11-26 Thread Siddu
On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner markkerz...@gmail.com wrote:

 Hi,

 it is probably described somewhere in the manuals, but


   1. Where are the log files, especially those that show my
   System.out.println() and errors; and

Look at the logs directory ...

   2. Do I need to log in to every machine on the cluster?

Try the WEB UI interface though i am not sure

 Thank you,
 Mark




-- 
Regards,
~Sid~
I have never met a man so ignorant that i couldn't learn something from him


Re: Hadoop 0.20 map/reduce Failing for old API

2009-11-26 Thread Rekha Joshi
The exit status of 1 usually indicates configuration issues, incorrect command 
invocation in hadoop 0.20 (incorrect params), if not JVM crash.
In your logs there is no indication of crash, but some paths/command can be the 
cause. Can you check if your lib paths/data paths are correct?

If it is a memory intensive task, you may also try values on 
mapred.child.java.opts /mapred.job.map.memory.mb.Thanks!

On 11/27/09 1:28 AM, Arv Mistry a...@kindsight.net wrote:

Hi,

We've recently upgraded to hadoop 0.20. Writing to HDFS seems to be
working fine, but the map/reduce jobs are failing with the following
exception. Note, we have not moved to the new map/reduce API yet. In the
client that launches the job, the only change I have made is to now load
the three files; core-site.xml, hdfs-site.xml and mapred-site.xml rather
than the hadoop-site.xml. Any ideas?

INFO   | jvm 1| 2009/11/26 13:47:26 | 2009-11-26 13:47:26,328 INFO
[FileInputFormat] Total input paths to process : 711
INFO   | jvm 1| 2009/11/26 13:47:28 | 2009-11-26 13:47:28,033 INFO
[JobClient] Running job: job_200911241319_0003
INFO   | jvm 1| 2009/11/26 13:47:29 | 2009-11-26 13:47:29,036 INFO
[JobClient]  map 0% reduce 0%
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,068 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_03_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:47:36 | java.io.IOException: Task
process exit with nonzero status of 1.
INFO   | jvm 1| 2009/11/26 13:47:36 |   at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO   | jvm 1| 2009/11/26 13:47:36 |
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,094 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_03_0filter=stdout
INFO   | jvm 1| 2009/11/26 13:47:36 | 2009-11-26 13:47:36,096 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_03_0filter=stderr
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,162 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_00_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:47:51 | java.io.IOException: Task
process exit with nonzero status of 1.
INFO   | jvm 1| 2009/11/26 13:47:51 |   at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO   | jvm 1| 2009/11/26 13:47:51 |
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,166 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_00_0filter=stdout
INFO   | jvm 1| 2009/11/26 13:47:51 | 2009-11-26 13:47:51,167 WARN
[JobClient] Error reading task
outputhttp://dev-cs1.ca.kindsight.net:50060/tasklog?plaintext=truetaski
d=attempt_200911241319_0003_m_00_0filter=stderr
INFO   | jvm 1| 2009/11/26 13:47:52 | 2009-11-26 13:47:52,173 INFO
[JobClient]  map 50% reduce 0%
INFO   | jvm 1| 2009/11/26 13:48:03 | 2009-11-26 13:48:03,219 INFO
[JobClient] Task Id : attempt_200911241319_0003_m_01_0, Status :
FAILED
INFO   | jvm 1| 2009/11/26 13:48:03 | Map output lost, rescheduling:
getMapOutput(attempt_200911241319_0003_m_01_0,0) failed :
INFO   | jvm 1| 2009/11/26 13:48:03 |
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200911241319_0003/attempt_200911241319_0003_m_0
1_0/output/file.out.index in any of the configured local directories
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathT
oRead(LocalDirAllocator.java:389)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAlloca
tor.java:138)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.
java:2886)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
16)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
INFO   | jvm 1| 2009/11/26 13:48:03 |   at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
INFO   | jvm 1| 

Re: log files on the cluster?

2009-11-26 Thread Mark Kerzner
Thank you, that pretty much does it, the logs on EC2 are in /mnt/hadoop/logs

On Thu, Nov 26, 2009 at 10:43 PM, Siddu siddu.s...@gmail.com wrote:

 On Fri, Nov 27, 2009 at 6:28 AM, Mark Kerzner markkerz...@gmail.com
 wrote:

  Hi,
 
  it is probably described somewhere in the manuals, but
 
 
1. Where are the log files, especially those that show my
System.out.println() and errors; and
 
 Look at the logs directory ...

2. Do I need to log in to every machine on the cluster?
 
 Try the WEB UI interface though i am not sure

  Thank you,
  Mark
 



 --
 Regards,
 ~Sid~
 I have never met a man so ignorant that i couldn't learn something from him



Re: please help in setting hadoop

2009-11-26 Thread aa225
Hi,
Just a thought, but you do not need to setup the temp directory in
conf/hadoop-site.xml especially if you are running basic examples. Give that a
shot, maybe it will work out. Otherwise see if you can find additional info in
the LOGS 

Thank You

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09 12:20 AM , Krishna Kumar krishna.ku...@nechclst.in sent:
 Dear All,
 Can anybody please help me in getting out from these error messages:
 [ hadoop]# hadoop jar
 /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar
 wordcount
 test test-op
 
 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to
 process : 4
 
 09/11/26 17:15:45 INFO mapred.FileInputFormat: Total input paths to
 process : 4
 
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid
 local directories in property: mapred.local.dir
 
 at
 org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730
 )
 
 at
 org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222)
 
 at
 org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194)
 
 at
 org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
 
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
 a:39)
 
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
 
 at java.lang.reflect.Method.invoke(Method.java:585)
 
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
 
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
 I am running the hadoop cluster as root user on two server nodes:
 master
 and slave.  My hadoop-site.xml file format is as follows :
 fs.default.name
 
 hdfs://master:54310
 dfs.permissions
 
 false
 dfs.name.dir
 
 /home/hadoop/dfs/name
 Further the o/p of ls command is as follows:
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/
 
 total 8
 
 drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs
 
 drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred
 
 [ hadoop]#
 
 [ hadoop]#
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/
 
 total 4
 
 drwxr-xr-x 2 root root 4096 Nov 26 16:49 local
 
 [ hadoop]#
 
 [ hadoop]# ls -l /home/hadoop/hadoop-root/mapred/local/
 
 total 0
 Thanks and Best Regards,
 
 Krishna Kumar
 
 Senior Storage Engineer 
 
 Why do we have to die? If we had to die, and everything is gone after
 that, then nothing else matters on this earth - everything is
 temporary,
 at least relative to me.
 DISCLAIMER: 
 ---
 
 The contents of this e-mail and any attachment(s) are confidential
 and
 intended 
 for the named recipient(s) only.  
 It shall not attach any liability on the originator or NECHCL or its 
 affiliates. Any views or opinions presented in  
 this email are solely those of the author and may not necessarily
 reflect the 
 opinions of NECHCL or its affiliates.  
 Any form of reproduction, dissemination, copying, disclosure,
 modification, 
 distribution and / or publication of  
 this message without the prior written consent of the author of this
 e-mail is 
 strictly prohibited. If you have  
 received this email in error please delete it and notify the sender 
 immediately. . 
 ---
 
 
 



Re: RE: please help in setting hadoop

2009-11-26 Thread aa225
Hi,
   There should be a folder called as logs in $HADOOP_HOME. Also try going
through
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29.
 

This is a pretty good tutorial

Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09  1:18 AM , Krishna Kumar krishna.ku...@nechclst.in sent:
 I have tried, but didn't get any success. In bwt can you please tell exact
 path of log file which I have to refer.
 
 
 Thanks and Best Regards,
 
 Krishna Kumar
 
 Senior Storage Engineer 
 
 Why do we have to die? If we had to die, and everything is gone after that,
 then nothing else matters on this earth - everything is temporary, at least
 relative to me.
 
 
 
 
 -Original Message-
 
 From: aa...@buffalo.edu [aa...@buffa
 lo.edu] 
 Sent: Friday, November 27, 2009 10:56 AM
 
 To: common-user@hadoop.apache.org
 Subject: Re: please help in setting hadoop
 
 
 
 Hi,
 
 Just a thought, but you do not need to setup the temp directory in
 
 conf/hadoop-site.xml especially if you are running basic examples. Give
 that a
 shot, maybe it will work out. Otherwise see if you can find additional info
 in
 the LOGS 
 
 
 
 Thank You
 
 
 
 Abhishek Agrawal
 
 
 
 SUNY- Buffalo
 
 (716-435-7122)
 
 
 
 On Fri 11/27/09 12:20 AM , Krishna Kumar kri
 shna.ku...@nechclst.in sent:
  Dear All,
 
  Can anybody please help me in getting out from
 these error messages:
  [ hadoop]# hadoop jar
 
 
 /usr/lib/hadoop/hadoop-0.18.3-14.cloudera.CH0_3-examples.jar
  wordcount
 
  test test-op
 
  
 
  09/11/26 17:15:45 INFO mapred.FileInputFormat:
 Total input paths to
  process : 4
 
  
 
  09/11/26 17:15:45 INFO mapred.FileInputFormat:
 Total input paths to
  process : 4
 
  
 
  org.apache.hadoop.ipc.RemoteException:
 java.io.IOException: No valid
  local directories in property: mapred.local.dir
 
  
 
  at
 
 
 org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:730
  )
 
  
 
  at
 
 
 org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:222)
  
 
  at
 
 
 org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:194)
  
 
  at
 
 
 org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1557)
  
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native
  Method)
 
  
 
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
  a:39)
 
  
 
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
  Impl.java:25)
 
  
 
  at
 java.lang.reflect.Method.invoke(Method.java:585)
  
 
  at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
  
 
  at
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
  I am running the hadoop cluster as root user on
 two server nodes:
  master
 
  and slave.  My hadoop-site.xml file format is as
 follows :
  fs.default.name
 
  
 
  hdfs://master:54310
  dfs.permissions
 
  
 
  false
 
  dfs.name.dir
 
  
 
  /home/hadoop/dfs/name
 
  Further the o/p of ls command is as follows:
 
  
 
  [ hadoop]# ls -l /home/hadoop/hadoop-root/
 
  
 
  total 8
 
  
 
  drwxr-xr-x 4 root root 4096 Nov 26 16:48 dfs
 
  
 
  drwxr-xr-x 3 root root 4096 Nov 26 16:49 mapred
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]# ls -l
 /home/hadoop/hadoop-root/mapred/
  
 
  total 4
 
  
 
  drwxr-xr-x 2 root root 4096 Nov 26 16:49 local
 
  
 
  [ hadoop]#
 
  
 
  [ hadoop]# ls -l
 /home/hadoop/hadoop-root/mapred/local/
  
 
  total 0
 
  Thanks and Best Regards,
 
  
 
  Krishna Kumar
 
  
 
  Senior Storage Engineer 
 
  
 
  Why do we have to die? If we had to die, and
 everything is gone after
  that, then nothing else matters on this earth -
 everything is
  temporary,
 
  at least relative to me.
 
  DISCLAIMER: 
 
 
 ---
  
 
  The contents of this e-mail and any
 attachment(s) are confidential
  and
 
  intended 
 
  for the named recipient(s) only.  
 
  It shall not attach any liability on the
 originator or NECHCL or its 
  affiliates. Any views or opinions presented in  
 
  this email are solely those of the author and
 may not necessarily
  reflect the 
 
  opinions of NECHCL or its affiliates.  
 
  Any form of reproduction, dissemination,
 copying, disclosure,
  modification, 
 
  distribution and / or publication of  
 
  this message without the prior written consent
 of the author of this
  e-mail is 
 
  strictly prohibited. If you have  
 
  received this email in error please delete it
 and notify the sender 
  immediately. . 
 
 
 ---
  
 
  
 
  
 
 
 
 
 
 
 
 
 
 DISCLAIMER: 
 
 ---
  
 The contents of this e-mail and any attachment(s) are confidential and
 
 intended 
 
 for the named recipient(s) only.  
 
 It 

Re: Doubt in Hadoop

2009-11-26 Thread Jeff Zhang
Do you run the map reduce job in command line or IDE?  in map reduce mode,
you should put the jar containing the map and reduce class in your classpath


Jeff Zhang



On Fri, Nov 27, 2009 at 2:19 PM, aa...@buffalo.edu wrote:

 Hello Everybody,
I have a doubt in Haddop and was wondering if anybody has
 faced a
 similar problem. I have a package called test. Inside that I have class
 called
 A.java, Map.java, Reduce.java. In A.java I have the main method where I am
 trying
 to initialize the jobConf object. I have written
 jobConf.setMapperClass(Map.class) and similarly for the reduce class as
 well. The
 code works correctly when I run the code locally via
 jobConf.set(mapred.job.tracker,local) but I get an exception when I try
 to
 run this code on my cluster. The stack trace of the exception is as under.
 I
 cannot understand the problem. Any help would be appreciated.

 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException: test.Map
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
 Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
 Markowitz.covarMatrixMap
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744)
... 6 more
 Caused by: java.lang.ClassNotFoundException: test.Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)
... 7 more

 Thank You


 Abhishek Agrawal

 SUNY- Buffalo
 (716-435-7122)






Re: Re: Doubt in Hadoop

2009-11-26 Thread aa225
Hi,
   I am running the job from command line. The job runs fine in the local mode
but something happens when I try to run the job in the distributed mode.


Abhishek Agrawal

SUNY- Buffalo
(716-435-7122)

On Fri 11/27/09  2:31 AM , Jeff Zhang zjf...@gmail.com sent:
 Do you run the map reduce job in command line or IDE?  in map reduce
 mode, you should put the jar containing the map and reduce class in
 your classpath
 Jeff Zhang
 On Fri, Nov 27, 2009 at 2:19 PM,   wrote:
 Hello Everybody,
                I have a doubt in Haddop and was wondering if
 anybody has faced a
 similar problem. I have a package called test. Inside that I have
 class called
 A.java, Map.java, Reduce.java. In A.java I have the main method
 where I am trying
 to initialize the jobConf object. I have written
 jobConf.setMapperClass(Map.class) and similarly for the reduce class
 as well. The
 code works correctly when I run the code locally via
 jobConf.set(mapred.job.tracker,local) but I get an exception
 when I try to
 run this code on my cluster. The stack trace of the exception is as
 under. I
 cannot understand the problem. Any help would be appreciated.
 java.lang.RuntimeException: java.lang.RuntimeException:
 java.lang.ClassNotFoundException: test.Map
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
        at
 org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:690)
        at
 org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at
 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)
 Caused by: java.lang.RuntimeException:
 java.lang.ClassNotFoundException:
 Markowitz.covarMatrixMap
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:744)
        ... 6 more
 Caused by: java.lang.ClassNotFoundException: test.Map
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native
 Method)
        at
 java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at
 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at
 java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)
        at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)
        ... 7 more
 Thank You
 Abhishek Agrawal
 SUNY- Buffalo
 (716-435-7122)