Hi Shane, I can't explain that, but I can say that with 0.19.0 I am using setNumTasksToExecutePerJvm(-1) and then initializing statically declared data in the Map configure successfully now. It really is educated guesswork for the tuning parameters though - I am profiling the app for memory usage locally and then from trial and error determining how much additional I need for the Node's hadoop framework actiities, in order to set the -Xmx params and Maps jobs per Nodes for the different EC2 sizes. A little dirty perhaps, but I am still learning (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html).
I'm interested to know when one would use a MultithreadedMapRunner also. Cheers Tim On Sun, Nov 30, 2008 at 11:22 PM, Shane Butler <[EMAIL PROTECTED]> wrote: > Given the goal of a shared data accessable across the Map instances, > can someone please explain some of the differences between using: > - setNumTasksToExecutePerJvm() and then having statically declared > data initialised in Mapper.configure(); and > - a MultithreadedMapRunner? > > Regards, > Shane > > > On Wed, Nov 26, 2008 at 6:41 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> tim robertson wrote: >>> >>> Thanks Alex - this will allow me to share the shapefile, but I need to >>> "one time only per job per jvm" read it, parse it and store the >>> objects in the index. >>> Is the Mapper.configure() the best place to do this? E.g. will it >>> only be called once per job? >> >> In 0.19, with HADOOP-249, all tasks from a job can be run in a single JVM. >> So, yes, you could access a static cache from Mapper.configure(). >> >> Doug >> >> >