Re: "Lookup" HashMap available within the Map

Chris K Wensel Tue, 25 Nov 2008 12:44:29 -0800

cool. If you need a hand with Cascading stuff, feel free to ping me onthe mail list or #cascading irc. lots of other friendly folk therealready.

ckw


On Nov 25, 2008, at 12:35 PM, tim robertson wrote:

Thanks Chris,

I have a different test running, then will implement that.  Might give
cascading a shot for what I am doing.

Cheers

Tim
On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]>wrote:
Hey Tim

The .configure() method is what you are looking for i believe.
It is called once per task, which in the default case, is once perjvm.
Note Jobs are broken into parallel tasks, each task handles aportion of theinput data. So you may create your map 100 times, because there are100
tasks, it will only be created once per jvm.

I hope this makes sense.

chris

On Nov 25, 2008, at 11:46 AM, tim robertson wrote:
Hi Doug,
Thanks - it is not so much I want to run in a single JVM - I dowant a
bunch of machines doing the work, it is just I want them all to have
this in-memory lookup index, that is configured once per job.  Is
there some hook somewhere that I can trigger a read from the
distributed cache, or is a Mapper.configure() the best place forthis?
Can it be called multiple times per Job meaning I need to keep some
static synchronised indicator flag?

Thanks again,

Tim
On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]>wrote:
tim robertson wrote:
Thanks Alex - this will allow me to share the shapefile, but Ineed to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this?  E.g. will it
only be called once per job?
In 0.19, with HADOOP-249, all tasks from a job can be run in asingle
JVM.
So, yes, you could access a static cache from Mapper.configure().

Doug
--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: "Lookup" HashMap available within the Map

Reply via email to