Re: "Lookup" HashMap available within the Map

Chris K Wensel Tue, 25 Nov 2008 12:25:29 -0800

Hey Tim

The .configure() method is what you are looking for i believe.


It is called once per task, which in the default case, is once per jvm.

Note Jobs are broken into parallel tasks, each task handles a portionof the input data. So you may create your map 100 times, because thereare 100 tasks, it will only be created once per jvm.


I hope this makes sense.

chris

On Nov 25, 2008, at 11:46 AM, tim robertson wrote:

Hi Doug,

Thanks - it is not so much I want to run in a single JVM - I do want a
bunch of machines doing the work, it is just I want them all to have
this in-memory lookup index, that is configured once per job.  Is
there some hook somewhere that I can trigger a read from the
distributed cache, or is a Mapper.configure() the best place for this?
Can it be called multiple times per Job meaning I need to keep some
static synchronised indicator flag?

Thanks again,

Tim

On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]>wrote:

tim robertson wrote:
Thanks Alex - this will allow me to share the shapefile, but Ineed to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this?  E.g. will it
only be called once per job?
In 0.19, with HADOOP-249, all tasks from a job can be run in asingle JVM.
So, yes, you could access a static cache from Mapper.configure().

Doug


--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: "Lookup" HashMap available within the Map

Reply via email to