You can use distributed cache for memory mapped files (they're local to the node the tasks run on.)
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies <[email protected]> wrote: > Here's the OP again. > > I want to make it clear that my question here has to do with the > problem of distributing 'the program' around the cluster, not 'the > data'. In the case at hand, the issue a system that has a large data > resource that it needs to do its work. Every instance of the code > needs the entire model. Not just some blocks or pieces. > > Memory mapping is a very attractive tactic for this kind of data > resource. The data is read-only. Memory-mapping it allows the > operating system to ensure that only one copy of the thing ends up in > physical memory. > > If we force the model into a conventional file (storable in HDFS) and > read it into the JVM in a conventional way, then we get as many copies > in memory as we have JVMs. On a big machine with a lot of cores, this > begins to add up. > > For people who are running a cluster of relatively conventional > systems, just putting copies on all the nodes in a conventional place > is adequate. >
