Thanks Ted. Is it correct to assume that all class members defined inside my Mapper are visible to all of the threads, so I should pay careful attention and take synchronization into account when accessing those objects?
Jim On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Looking through MultithreadedMapRunner, map() seems to be the only method > called by executorService: > MultithreadedMapRunner.this.mapper.map(key, value, output, > reporter); > > > On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <jim.twen...@gmail.com> wrote: > >> Hi, >> >> I've decided to refactor some of my Hadoop jobs and implement them >> using MultithreadedMapper.class but I got puzzled because of some >> unexpected error messages at run time. >> Here are some relevant settings regarding my Hadoop cluster: >> >> mapred.tasktracker.map.tasks.maximum = 1 >> mapred.tasktracker.reduce.tasks.maximum = 1 >> mapred.job.reuse.jvm.num.tasks = -1 >> mapred.map.multithreadedrunner.threads = 4 >> >> I'd like to know how threads are used to run the map task in a single >> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper >> class as such: >> >> class Mapper ... { >> >> MyObject A; >> static MyObject B; >> >> setup() { >> Configuration conf = context.getConfiguration(); >> A.initialize(c); >> B.initialize(c); >> } >> >> map() {...} >> >> cleanup() {...} >> >> Does each thread run all three of setup(), map(), cleanup() methods ? >> >> -OR- >> >> Are setup() and cleanup() run once per task (and thus per JVM >> according to my settings) and so map is the only multithreaded >> function? >> Also, are the objects A and B shared among different threads or does >> each trade have its own copy of them? My initial guess was that each >> thread would have a separate copy of A, and B would be shared among >> the 4 threads running on the same box since it is defined as static, >> but it appears to me that this assumption is not correct and A seems >> to be shared. >> >> Thanks, >> Jim >> >