Thanks Ted. Is it correct to assume that all class members defined
inside my Mapper are visible to all of the threads, so I should pay
careful attention and take synchronization into account when accessing
those objects?

Jim

On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Looking through MultithreadedMapRunner, map() seems to be the only method
> called by executorService:
>        MultithreadedMapRunner.this.mapper.map(key, value, output,
> reporter);
>
>
> On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <jim.twen...@gmail.com> wrote:
>
>> Hi,
>>
>> I've decided to refactor some of my Hadoop jobs and implement them
>> using MultithreadedMapper.class but I got puzzled because of some
>> unexpected error messages at run time.
>> Here are some relevant settings regarding my Hadoop cluster:
>>
>> mapred.tasktracker.map.tasks.maximum = 1
>> mapred.tasktracker.reduce.tasks.maximum = 1
>> mapred.job.reuse.jvm.num.tasks = -1
>> mapred.map.multithreadedrunner.threads = 4
>>
>> I'd like to know how threads are used to run the map task in a single
>> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
>> class as such:
>>
>> class Mapper ... {
>>
>> MyObject A;
>> static MyObject B;
>>
>> setup() {
>>   Configuration conf = context.getConfiguration();
>>   A.initialize(c);
>>   B.initialize(c);
>> }
>>
>> map() {...}
>>
>> cleanup() {...}
>>
>> Does each thread run all three of setup(), map(), cleanup() methods ?
>>
>> -OR-
>>
>> Are setup() and cleanup() run once per task (and thus per JVM
>> according to my settings) and so map is the only multithreaded
>> function?
>> Also, are the objects A and B shared among different threads or does
>> each trade have its own copy of them? My initial guess was that each
>> thread would have a separate copy of A, and B would be shared among
>> the 4 threads running on the same box since it is defined as static,
>> but it appears to me that this assumption is not correct and A seems
>> to be shared.
>>
>> Thanks,
>> Jim
>>
>

Reply via email to