It really depends on what type of data you are sharing, how you are looking up 
the data, whether the data is Read-write, and whether you care about 
consistency. If you don't care about consistency, I suggest that you shove the 
data into a BDB store (for key-value lookup) or a lucene store, and copy the 
data to all the nodes. That way all data access will be in-process, no gc 
problems, and you will get very fast results. BDB and lucene both have easy 
replication strategies.

If the data is RW, and you need consistency, you should probably forget about 
MapReduce and just run everything on big-iron.

Regards,
Alan Ho




----- Original Message ----
From: Devajyoti Sarkar <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, October 2, 2008 8:41:04 PM
Subject: Sharing an object across mappers

I think each mapper/reducer runs in its own JVM which makes it impossible to
share objects. I need to share a large object so that I can access it at
memory speeds across all the mappers. Is it possible to have all the mappers
run in the same VM? Or is there a way to do this across VMs at high speed? I
guess JMI and others such methods will be just too slow.

Thanks,
Dev



      __________________________________________________________________
Instant Messaging, free SMS, sharing photos and more... Try the new Yahoo! 
Canada Messenger at http://ca.beta.messenger.yahoo.com/

Reply via email to