Re: Memory mapped resources

Benson Margulies Tue, 12 Apr 2011 10:40:39 -0700

Here's the OP again.

I want to make it clear that my question here has to do with the
problem of distributing 'the program' around the cluster, not 'the
data'. In the case at hand, the issue a system that has a large data
resource that it needs to do its work. Every instance of the code
needs the entire model. Not just some blocks or pieces.


Memory mapping is a very attractive tactic for this kind of data
resource. The data is read-only. Memory-mapping it allows the
operating system to ensure that only one copy of the thing ends up in
physical memory.

If we force the model into a conventional file (storable in HDFS) and
read it into the JVM in a conventional way, then we get as many copies
in memory as we have JVMs.  On a big machine with a lot of cores, this
begins to add up.

For people who are running a cluster of relatively conventional
systems, just putting copies on all the nodes in a conventional place
is adequate.

Re: Memory mapped resources

Reply via email to