This will likely break most programs you try to run. Many mapper
implementations are not thread safe.

That having been said, if you want to force all programs using the old API
(org.apache.hadoop.mapred.*) to run on the multithreaded maprunner, you can
do this by setting mapred.map.runner.class to
org.apache.hadoop.mapred.lib.MultithreadedMapRunner in mapred-site.xml.

Rather than do this in mapred-site.xml, it is far preferable to explicitly
call jobConf.setMapRunnerClass() in the applications that require the
multithreaded map runner.

In the new API, the MapRunnable interface is not used. Instead the
Mapper.run() method controls the execution of the map() method. For your own
applications, you should subclass
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper instead of
o.a.h.mapreduce.Mapper. This will provide a multithreaded run() method. I am
pretty sure that you cannot independently switch out the run() layer of an
existing application except by modifying its source to subclass the
MultithreadedMapper.

Finally, you should really ask yourself why you're doing this. If you have
multi-core machines, the best way to manage parallelism is to configure
Hadoop to use multiple task slots per machine. Set
mapred.tasktracker.map.tasks.maximum to '8' to use eight map tasks per node
(This is changed to mapreduce.tasktracker.map.tasks.maximum in 0.21+). This
allows single-threaded mapper code to efficiently process multiple input
splits in parallel. The only time when it's better to use multithreaded
maprunners is when a specific map() process is high-latency; e.g., you're
running a web crawler in a mapper, and you want to overlap requests to
foreign sites. But since this is not the norm, you should generally leave
things singlethreaded.

Hope this helps
Cheers
- Aaron

On Fri, Jun 11, 2010 at 7:30 AM, Jyothish Soman <jyothish.so...@gmail.com>wrote:

> Hi,
>
> I am a newbie to Hadoop. I want to use the Multi threaded runner by
> default, so I tried to change the MapTask.java code. it failed to compile
> using ant, as mapreduce - mapred library conflict was there, Can you please
> suggest a way through, so that  I can use the same.
>
> Regards,
> Jyothish Soman
>

Reply via email to