Here's the OP again. I want to make it clear that my question here has to do with the problem of distributing 'the program' around the cluster, not 'the data'. In the case at hand, the issue a system that has a large data resource that it needs to do its work. Every instance of the code needs the entire model. Not just some blocks or pieces.
Memory mapping is a very attractive tactic for this kind of data resource. The data is read-only. Memory-mapping it allows the operating system to ensure that only one copy of the thing ends up in physical memory. If we force the model into a conventional file (storable in HDFS) and read it into the JVM in a conventional way, then we get as many copies in memory as we have JVMs. On a big machine with a lot of cores, this begins to add up. For people who are running a cluster of relatively conventional systems, just putting copies on all the nodes in a conventional place is adequate.