Thanks Ted. Is it correct to assume that all class members defined
inside my Mapper are visible to all of the threads, so I should pay
careful attention and take synchronization into account when accessing
those objects?


On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu <> wrote:
> Looking through MultithreadedMapRunner, map() seems to be the only method
> called by executorService:
>, value, output,
> reporter);
> On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <> wrote:
>> Hi,
>> I've decided to refactor some of my Hadoop jobs and implement them
>> using MultithreadedMapper.class but I got puzzled because of some
>> unexpected error messages at run time.
>> Here are some relevant settings regarding my Hadoop cluster:
>> = 1
>> mapred.tasktracker.reduce.tasks.maximum = 1
>> mapred.job.reuse.jvm.num.tasks = -1
>> = 4
>> I'd like to know how threads are used to run the map task in a single
>> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
>> class as such:
>> class Mapper ... {
>> MyObject A;
>> static MyObject B;
>> setup() {
>>   Configuration conf = context.getConfiguration();
>>   A.initialize(c);
>>   B.initialize(c);
>> }
>> map() {...}
>> cleanup() {...}
>> Does each thread run all three of setup(), map(), cleanup() methods ?
>> -OR-
>> Are setup() and cleanup() run once per task (and thus per JVM
>> according to my settings) and so map is the only multithreaded
>> function?
>> Also, are the objects A and B shared among different threads or does
>> each trade have its own copy of them? My initial guess was that each
>> thread would have a separate copy of A, and B would be shared among
>> the 4 threads running on the same box since it is defined as static,
>> but it appears to me that this assumption is not correct and A seems
>> to be shared.
>> Thanks,
>> Jim

Reply via email to