Thanks Ted. Is it correct to assume that all class members defined
inside my Mapper are visible to all of the threads, so I should pay
careful attention and take synchronization into account when accessing
those objects?
Jim
On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu yuzhih...@gmail.com wrote:
Looking through MultithreadedMapRunner, map() seems to be the only method
called by executorService:
MultithreadedMapRunner.this.mapper.map(key, value, output,
reporter);
On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky jim.twen...@gmail.com wrote:
Hi,
I've decided to refactor some of my Hadoop jobs and implement them
using MultithreadedMapper.class but I got puzzled because of some
unexpected error messages at run time.
Here are some relevant settings regarding my Hadoop cluster:
mapred.tasktracker.map.tasks.maximum = 1
mapred.tasktracker.reduce.tasks.maximum = 1
mapred.job.reuse.jvm.num.tasks = -1
mapred.map.multithreadedrunner.threads = 4
I'd like to know how threads are used to run the map task in a single
JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
class as such:
class Mapper ... {
MyObject A;
static MyObject B;
setup() {
Configuration conf = context.getConfiguration();
A.initialize(c);
B.initialize(c);
}
map() {...}
cleanup() {...}
Does each thread run all three of setup(), map(), cleanup() methods ?
-OR-
Are setup() and cleanup() run once per task (and thus per JVM
according to my settings) and so map is the only multithreaded
function?
Also, are the objects A and B shared among different threads or does
each trade have its own copy of them? My initial guess was that each
thread would have a separate copy of A, and B would be shared among
the 4 threads running on the same box since it is defined as static,
but it appears to me that this assumption is not correct and A seems
to be shared.
Thanks,
Jim