Hi Alan, Thanks for your message.
The object can be read-only once it is initialized - I do not need to modify it. Essentially it is an object that allows me to analyze/modify data that I am mapping/reducing. It comes to about 3-4GB of RAM. The problem I have is that if I run multiple mappers, this object gets replicated in the different VMs and I run out of memory on my node. I pretty much need to have the full object in memory to do my processing. It is possible (though quite difficult) to have it partially on disk and query it (like a lucene store implementation) but there is a significant performance hit. As an e.g., let us say I use the xlarge CPU instance at Amazon (8CPUs, 8GB RAM). In this scenario, I can really only have 1 mapper per node whereas there are 8 CPUs. But if the overhead of sharing the object (e.g. RMI) or persisting the object (e.g. lucene) is greater than 8 times the memory speed, then it is cheaper to run 1 mapper/node. I tried sharing with Terracotta and I was getting a roughly 600 times decrease in performance versus in-memory access. So ideally, if I could have all the mappers in the same VM, then I can create a singleton and still have multiple mappers access it at memory speeds. Please do let me know if I am looking at this correctly and if the above is possible. Thanks a lot for all your help. Cheers, Dev On Fri, Oct 3, 2008 at 12:49 PM, Alan Ho <[EMAIL PROTECTED]> wrote: > It really depends on what type of data you are sharing, how you are looking > up the data, whether the data is Read-write, and whether you care about > consistency. If you don't care about consistency, I suggest that you shove > the data into a BDB store (for key-value lookup) or a lucene store, and copy > the data to all the nodes. That way all data access will be in-process, no > gc problems, and you will get very fast results. BDB and lucene both have > easy replication strategies. > > If the data is RW, and you need consistency, you should probably forget > about MapReduce and just run everything on big-iron. > > Regards, > Alan Ho > > > > > ----- Original Message ---- > From: Devajyoti Sarkar <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Thursday, October 2, 2008 8:41:04 PM > Subject: Sharing an object across mappers > > I think each mapper/reducer runs in its own JVM which makes it impossible > to > share objects. I need to share a large object so that I can access it at > memory speeds across all the mappers. Is it possible to have all the > mappers > run in the same VM? Or is there a way to do this across VMs at high speed? > I > guess JMI and others such methods will be just too slow. > > Thanks, > Dev > > > > __________________________________________________________________ > Instant Messaging, free SMS, sharing photos and more... Try the new Yahoo! > Canada Messenger at http://ca.beta.messenger.yahoo.com/ >