Hey Tim

The .configure() method is what you are looking for i believe.

It is called once per task, which in the default case, is once per jvm.

Note Jobs are broken into parallel tasks, each task handles a portion of the input data. So you may create your map 100 times, because there are 100 tasks, it will only be created once per jvm.

I hope this makes sense.

chris

On Nov 25, 2008, at 11:46 AM, tim robertson wrote:

Hi Doug,

Thanks - it is not so much I want to run in a single JVM - I do want a
bunch of machines doing the work, it is just I want them all to have
this in-memory lookup index, that is configured once per job.  Is
there some hook somewhere that I can trigger a read from the
distributed cache, or is a Mapper.configure() the best place for this?
Can it be called multiple times per Job meaning I need to keep some
static synchronised indicator flag?

Thanks again,

Tim


On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
tim robertson wrote:

Thanks Alex - this will allow me to share the shapefile, but I need to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this?  E.g. will it
only be called once per job?

In 0.19, with HADOOP-249, all tasks from a job can be run in a single JVM.
So, yes, you could access a static cache from Mapper.configure().

Doug



--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Reply via email to