The more I use it, i realize Hadoop is not  build around shared
memory. For these type of things, use TSpaces (IBM), that way you can
have a flag to load it once and allow for sharing.
Regards
Saptarshi


On Tue, Nov 25, 2008 at 3:42 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
> cool. If you need a hand with Cascading stuff, feel free to ping me on the
> mail list or #cascading irc. lots of other friendly folk there already.
>
> ckw
>
> On Nov 25, 2008, at 12:35 PM, tim robertson wrote:
>
>> Thanks Chris,
>>
>> I have a different test running, then will implement that.  Might give
>> cascading a shot for what I am doing.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
>>>
>>> Hey Tim
>>>
>>> The .configure() method is what you are looking for i believe.
>>>
>>> It is called once per task, which in the default case, is once per jvm.
>>>
>>> Note Jobs are broken into parallel tasks, each task handles a portion of
>>> the
>>> input data. So you may create your map 100 times, because there are 100
>>> tasks, it will only be created once per jvm.
>>>
>>> I hope this makes sense.
>>>
>>> chris
>>>
>>> On Nov 25, 2008, at 11:46 AM, tim robertson wrote:
>>>
>>>> Hi Doug,
>>>>
>>>> Thanks - it is not so much I want to run in a single JVM - I do want a
>>>> bunch of machines doing the work, it is just I want them all to have
>>>> this in-memory lookup index, that is configured once per job.  Is
>>>> there some hook somewhere that I can trigger a read from the
>>>> distributed cache, or is a Mapper.configure() the best place for this?
>>>> Can it be called multiple times per Job meaning I need to keep some
>>>> static synchronised indicator flag?
>>>>
>>>> Thanks again,
>>>>
>>>> Tim
>>>>
>>>>
>>>> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> tim robertson wrote:
>>>>>>
>>>>>> Thanks Alex - this will allow me to share the shapefile, but I need to
>>>>>> "one time only per job per jvm" read it, parse it and store the
>>>>>> objects in the index.
>>>>>> Is the Mapper.configure() the best place to do this?  E.g. will it
>>>>>> only be called once per job?
>>>>>
>>>>> In 0.19, with HADOOP-249, all tasks from a job can be run in a single
>>>>> JVM.
>>>>> So, yes, you could access a static cache from Mapper.configure().
>>>>>
>>>>> Doug
>>>>>
>>>>>
>>>
>>> --
>>> Chris K Wensel
>>> [EMAIL PROTECTED]
>>> http://chris.wensel.net/
>>> http://www.cascading.org/
>>>
>>>
>
>



-- 
Saptarshi Guha - [EMAIL PROTECTED]

Reply via email to