The more I use it, i realize Hadoop is not build around shared memory. For these type of things, use TSpaces (IBM), that way you can have a flag to load it once and allow for sharing. Regards Saptarshi
On Tue, Nov 25, 2008 at 3:42 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > cool. If you need a hand with Cascading stuff, feel free to ping me on the > mail list or #cascading irc. lots of other friendly folk there already. > > ckw > > On Nov 25, 2008, at 12:35 PM, tim robertson wrote: > >> Thanks Chris, >> >> I have a different test running, then will implement that. Might give >> cascading a shot for what I am doing. >> >> Cheers >> >> Tim >> >> >> On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: >>> >>> Hey Tim >>> >>> The .configure() method is what you are looking for i believe. >>> >>> It is called once per task, which in the default case, is once per jvm. >>> >>> Note Jobs are broken into parallel tasks, each task handles a portion of >>> the >>> input data. So you may create your map 100 times, because there are 100 >>> tasks, it will only be created once per jvm. >>> >>> I hope this makes sense. >>> >>> chris >>> >>> On Nov 25, 2008, at 11:46 AM, tim robertson wrote: >>> >>>> Hi Doug, >>>> >>>> Thanks - it is not so much I want to run in a single JVM - I do want a >>>> bunch of machines doing the work, it is just I want them all to have >>>> this in-memory lookup index, that is configured once per job. Is >>>> there some hook somewhere that I can trigger a read from the >>>> distributed cache, or is a Mapper.configure() the best place for this? >>>> Can it be called multiple times per Job meaning I need to keep some >>>> static synchronised indicator flag? >>>> >>>> Thanks again, >>>> >>>> Tim >>>> >>>> >>>> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> tim robertson wrote: >>>>>> >>>>>> Thanks Alex - this will allow me to share the shapefile, but I need to >>>>>> "one time only per job per jvm" read it, parse it and store the >>>>>> objects in the index. >>>>>> Is the Mapper.configure() the best place to do this? E.g. will it >>>>>> only be called once per job? >>>>> >>>>> In 0.19, with HADOOP-249, all tasks from a job can be run in a single >>>>> JVM. >>>>> So, yes, you could access a static cache from Mapper.configure(). >>>>> >>>>> Doug >>>>> >>>>> >>> >>> -- >>> Chris K Wensel >>> [EMAIL PROTECTED] >>> http://chris.wensel.net/ >>> http://www.cascading.org/ >>> >>> > > -- Saptarshi Guha - [EMAIL PROTECTED]