Hi Owen, Thanks a lot for the pointers.
In order to use the MultiThreadedMapRunner, if I change the setMapRunnerClass() method in the jobConf, then does the rest of my code remain the same (apart from making it thread-safe)? Thanks in advance, Dev On Sat, Oct 4, 2008 at 12:29 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Oct 3, 2008, at 7:49 AM, Devajyoti Sarkar wrote: > > Briefly going through the DistributedCache information, it seems to be a >> way >> to distribute files to mappers/reducers. >> > > Sure, but it handles the distribution problem for you. > > One still needs to read the >> contents into each map/reduce task VM. >> > > If the data is straight binary data, you could just mmap it from the > various tasks. It would be pretty efficient. > > The other direction is to use the MultiThreadedMapRunner and run multiple > maps as threads in the same VM. But unless your maps are CPU heavy or > contacting external servers, it probably won't help as much as you'd like. > > -- Owen >