Hi,

I've decided to refactor some of my Hadoop jobs and implement them
using MultithreadedMapper.class but I got puzzled because of some
unexpected error messages at run time.
Here are some relevant settings regarding my Hadoop cluster:

mapred.tasktracker.map.tasks.maximum = 1
mapred.tasktracker.reduce.tasks.maximum = 1
mapred.job.reuse.jvm.num.tasks = -1
mapred.map.multithreadedrunner.threads = 4

I'd like to know how threads are used to run the map task in a single
JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
class as such:

class Mapper ... {

MyObject A;
static MyObject B;

setup() {
   Configuration conf = context.getConfiguration();
   A.initialize(c);
   B.initialize(c);
}

map() {...}

cleanup() {...}

Does each thread run all three of setup(), map(), cleanup() methods ?

-OR-

Are setup() and cleanup() run once per task (and thus per JVM
according to my settings) and so map is the only multithreaded
function?
Also, are the objects A and B shared among different threads or does
each trade have its own copy of them? My initial guess was that each
thread would have a separate copy of A, and B would be shared among
the 4 threads running on the same box since it is defined as static,
but it appears to me that this assumption is not correct and A seems
to be shared.

Thanks,
Jim

Reply via email to