Re: maprd vs mapreduce api

2011-08-05 Thread Stevens, Keith D.
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the 
identity function.  So you should be able to just do 

conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class);

without having to implement your own no-op classes.

I recommend reading the javadoc for differences between the old api and the new 
api, for example http://hadoop.apache.org/common/docs/r0.20.2/api/index.html 
indicates the different functionality of Mapper in the new api and it's dual 
use as the identity mapper.

Cheers,
--Keith

On Aug 5, 2011, at 1:15 PM, garpinc wrote:

 
 I was following this tutorial on version 0.19.1
 
 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html
 
 I however wanted to use the latest version of api 0.20.2
 
 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
 
 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
 public static class NOOPMapper 
  extends MapperText, IntWritable, Text, IntWritable{
 
 
   public void map(Text key, IntWritable value, Context context
   ) throws IOException, InterruptedException {
 
   context.write(key, value);
 
   }
 }
 
 public static class NOOPReducer 
  extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();
 
   public void reduce(Text key, IterableIntWritable values, 
  Context context
  ) throws IOException, InterruptedException {
 context.write(key, result);
   }
 }
 
 
 And then with code:
   Configuration conf = new Configuration();
   Job job = new Job(conf, testdriver);
 
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
 
   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);
 
   FileInputFormat.addInputPath(job, new Path(In));
   FileOutputFormat.setOutputPath(job, new Path(Out));
 
   job.setMapperClass(NOOPMapper.class);
   job.setReducerClass(NOOPReducer.class);
 
   job.waitForCompletion(true);
 
 
 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
   at TestDriver$NOOPMapper.map(TestDriver.java:1)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0
 
 
 
 Can anyone tell me what I need for this to work.
 
 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java 
 -- 
 View this message in context: 
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 



Mappers fail to initialize and are killed after 600 seconds

2011-08-01 Thread Stevens, Keith D.
Hi all,

I'm running a simple mapreduce job that connects to an hbase table, reads each 
row, counts some co-occurrence frequencies, and writes everything out to hdfs 
at the end.  Everything seems to be going smoothly until the last 5, out of 
108, tasks run.  The last 5 tasks seem to be stuck initializing.  As far as I 
can tell, setup is never called, and eventually, after 600 seconds, the task is 
killed.  The task jumps around different nodes to try and run but regardless of 
the node, it fails to initialize and is killed.

My first guess is that it's trying to connect to an hbase region server and 
failing, but I don't see anything like this in the task tracker nodes.  Here 
are the log lines related to one of the failed tasks from the task trackers 
logs:

2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: 
LaunchTaskAction (registerTask): attempt_201107281508_0028_m_27_0 task's 
state:UNASSIGNED
2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
launch : attempt_201107281508_0028_m_27_0 which needs 1 slots
2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In 
TaskLauncher, current free slots : 1 and trying to launch 
attempt_201107281508_0028_m_27_0 which needs 1 slots
2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: 
jvm_201107281508_0028_m_-1189914759 given task: 
attempt_201107281508_0028_m_27_0
2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_201107281508_0028_m_27_0: Task attempt_201107281508_0028_m_27_0 
failed to report status for 600 seconds. Killing!
2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to 
purge task: attempt_201107281508_0028_m_27_0
2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_201107281508_0028_m_27_0 done; removing files.
2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID 
attempt_201107281508_0028_m_27_0 not found in cache
2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: 
LaunchTaskAction (registerTask): attempt_201107281508_0028_m_27_0 task's 
state:FAILED_UNCLEAN
2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
launch : attempt_201107281508_0028_m_27_0 which needs 1 slots
2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In 
TaskLauncher, current free slots : 1 and trying to launch 
attempt_201107281508_0028_m_27_0 which needs 1 slots
2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: 
jvm_201107281508_0028_m_-1869983962 given task: 
attempt_201107281508_0028_m_27_0
2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_201107281508_0028_m_27_0 0.0% 
2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: 
attempt_201107281508_0028_m_27_0 0.0% cleanup
2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task 
attempt_201107281508_0028_m_27_0 is done.
2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported 
output size for attempt_201107281508_0028_m_27_0  was -1
2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_201107281508_0028_m_27_0 done; removing files.
2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: 
attempt_201107281508_0028_m_27_0 done; removing files.

And here are the syslog lines:
In my job, I set the stats when i enter and exit setup, and I set counters in 
map.  None of these are triggered for this task.  Nothing is written to stderr 
or stdout, and the syslogs for the task have nothing beyond the zookeeper 
client connection lines.

Any thoughts as to what might be causing this issue?  Is there another log that 
indicates which region server this task is trying to connect to?

Thanks!
--Keith Stevens

Re: Mappers fail to initialize and are killed after 600 seconds

2011-08-01 Thread Stevens, Keith D.
In short, there are no userlogs.  stderr and stdout are both empty.  I copied 
the output from syslog to the following pastebin: http://pastebin.com/0XXE9Jze. 
 The first 22 lines look to be exactly the same as the syslogs for other, 
non-dying, tasks.   The main departure is on line 23 where the loader can't 
seem to load native-hadoop libraries, and this happens about 10 minutes after 
starting up.

--Keith

On Aug 1, 2011, at 1:00 PM, Harsh J wrote:

 Are there no userlogs from the failed tasks? TaskTracker logs won't
 carry user-code (task) logs. Could you paste those syslog lines (from
 the task) to pastebin/etc. since the lists may not be accepting
 attachments?
 
 On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. steven...@llnl.gov wrote:
 Hi all,
 
 I'm running a simple mapreduce job that connects to an hbase table, reads 
 each row, counts some co-occurrence frequencies, and writes everything out 
 to hdfs at the end.  Everything seems to be going smoothly until the last 5, 
 out of 108, tasks run.  The last 5 tasks seem to be stuck initializing.  As 
 far as I can tell, setup is never called, and eventually, after 600 seconds, 
 the task is killed.  The task jumps around different nodes to try and run 
 but regardless of the node, it fails to initialize and is killed.
 
 My first guess is that it's trying to connect to an hbase region server and 
 failing, but I don't see anything like this in the task tracker nodes.  Here 
 are the log lines related to one of the failed tasks from the task trackers 
 logs:
 
 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: 
 LaunchTaskAction (registerTask): attempt_201107281508_0028_m_27_0 task's 
 state:UNASSIGNED
 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
 launch : attempt_201107281508_0028_m_27_0 which needs 1 slots
 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In 
 TaskLauncher, current free slots : 1 and trying to launch 
 attempt_201107281508_0028_m_27_0 which needs 1 slots
 2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with 
 ID: jvm_201107281508_0028_m_-1189914759 given task: 
 attempt_201107281508_0028_m_27_0
 2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201107281508_0028_m_27_0: Task 
 attempt_201107281508_0028_m_27_0 failed to report status for 600 
 seconds. Killing!
 2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to 
 purge task: attempt_201107281508_0028_m_27_0
 2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: 
 attempt_201107281508_0028_m_27_0 done; removing files.
 2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID 
 attempt_201107281508_0028_m_27_0 not found in cache
 2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: 
 LaunchTaskAction (registerTask): attempt_201107281508_0028_m_27_0 task's 
 state:FAILED_UNCLEAN
 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
 launch : attempt_201107281508_0028_m_27_0 which needs 1 slots
 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In 
 TaskLauncher, current free slots : 1 and trying to launch 
 attempt_201107281508_0028_m_27_0 which needs 1 slots
 2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with 
 ID: jvm_201107281508_0028_m_-1869983962 given task: 
 attempt_201107281508_0028_m_27_0
 2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201107281508_0028_m_27_0 0.0%
 2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201107281508_0028_m_27_0 0.0% cleanup
 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task 
 attempt_201107281508_0028_m_27_0 is done.
 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported 
 output size for attempt_201107281508_0028_m_27_0  was -1
 2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: 
 attempt_201107281508_0028_m_27_0 done; removing files.
 2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: 
 attempt_201107281508_0028_m_27_0 done; removing files.
 
 And here are the syslog lines:
 In my job, I set the stats when i enter and exit setup, and I set counters 
 in map.  None of these are triggered for this task.  Nothing is written to 
 stderr or stdout, and the syslogs for the task have nothing beyond the 
 zookeeper client connection lines.
 
 Any thoughts as to what might be causing this issue?  Is there another log 
 that indicates which region server this task is trying to connect to?
 
 Thanks!
 --Keith Stevens
 
 
 
 -- 
 Harsh J