cool. If you need a hand with Cascading stuff, feel free to ping me on the mail list or #cascading irc. lots of other friendly folk there already.

ckw

On Nov 25, 2008, at 12:35 PM, tim robertson wrote:

Thanks Chris,

I have a different test running, then will implement that.  Might give
cascading a shot for what I am doing.

Cheers

Tim


On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
Hey Tim

The .configure() method is what you are looking for i believe.

It is called once per task, which in the default case, is once per jvm.

Note Jobs are broken into parallel tasks, each task handles a portion of the input data. So you may create your map 100 times, because there are 100
tasks, it will only be created once per jvm.

I hope this makes sense.

chris

On Nov 25, 2008, at 11:46 AM, tim robertson wrote:

Hi Doug,

Thanks - it is not so much I want to run in a single JVM - I do want a
bunch of machines doing the work, it is just I want them all to have
this in-memory lookup index, that is configured once per job.  Is
there some hook somewhere that I can trigger a read from the
distributed cache, or is a Mapper.configure() the best place for this?
Can it be called multiple times per Job meaning I need to keep some
static synchronised indicator flag?

Thanks again,

Tim


On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

tim robertson wrote:

Thanks Alex - this will allow me to share the shapefile, but I need to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this?  E.g. will it
only be called once per job?

In 0.19, with HADOOP-249, all tasks from a job can be run in a single
JVM.
So, yes, you could access a static cache from Mapper.configure().

Doug



--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/



Reply via email to