Hi, I've decided to refactor some of my Hadoop jobs and implement them using MultithreadedMapper.class but I got puzzled because of some unexpected error messages at run time. Here are some relevant settings regarding my Hadoop cluster:
mapred.tasktracker.map.tasks.maximum = 1 mapred.tasktracker.reduce.tasks.maximum = 1 mapred.job.reuse.jvm.num.tasks = -1 mapred.map.multithreadedrunner.threads = 4 I'd like to know how threads are used to run the map task in a single JVM (Correct me if this is wrong). Suppose I've got a sample Mapper class as such: class Mapper ... { MyObject A; static MyObject B; setup() { Configuration conf = context.getConfiguration(); A.initialize(c); B.initialize(c); } map() {...} cleanup() {...} Does each thread run all three of setup(), map(), cleanup() methods ? -OR- Are setup() and cleanup() run once per task (and thus per JVM according to my settings) and so map is the only multithreaded function? Also, are the objects A and B shared among different threads or does each trade have its own copy of them? My initial guess was that each thread would have a separate copy of A, and B would be shared among the 4 threads running on the same box since it is defined as static, but it appears to me that this assumption is not correct and A seems to be shared. Thanks, Jim