Actually, it doesn't become trivial. It just becomes total fail or total win instead of almost always being partial win. It doesn't meet Benson's need.
On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > To get around the chunks or blocks problem, I've been implementing a > system that simply sets a max block size that is too large for a file > to reach. In this way there will only be one block for HDFS file, and > so MMap'ing or other single file ops become trivial. > > On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies > <bimargul...@gmail.com> wrote: > > Here's the OP again. > > > > I want to make it clear that my question here has to do with the > > problem of distributing 'the program' around the cluster, not 'the > > data'. In the case at hand, the issue a system that has a large data > > resource that it needs to do its work. Every instance of the code > > needs the entire model. Not just some blocks or pieces. > > > > Memory mapping is a very attractive tactic for this kind of data > > resource. The data is read-only. Memory-mapping it allows the > > operating system to ensure that only one copy of the thing ends up in > > physical memory. > > > > If we force the model into a conventional file (storable in HDFS) and > > read it into the JVM in a conventional way, then we get as many copies > > in memory as we have JVMs. On a big machine with a lot of cores, this > > begins to add up. > > > > For people who are running a cluster of relatively conventional > > systems, just putting copies on all the nodes in a conventional place > > is adequate. > > >