Hi Owen,

Thanks a lot for the pointers.

In order to use the MultiThreadedMapRunner, if I change the
setMapRunnerClass() method in the jobConf, then does the rest of my code
remain the same (apart from making it thread-safe)?

Thanks in advance,
Dev


On Sat, Oct 4, 2008 at 12:29 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:

>
> On Oct 3, 2008, at 7:49 AM, Devajyoti Sarkar wrote:
>
>  Briefly going through the DistributedCache information, it seems to be a
>> way
>> to distribute files to mappers/reducers.
>>
>
> Sure, but it handles the distribution problem for you.
>
>  One still needs to read the
>> contents into each map/reduce task VM.
>>
>
> If the data is straight binary data, you could just mmap it from the
> various tasks. It would be pretty efficient.
>
> The other direction is to use the MultiThreadedMapRunner and run multiple
> maps as threads in the same VM. But unless your maps are CPU heavy or
> contacting external servers, it probably won't help as much as you'd like.
>
> -- Owen
>

Reply via email to