On Thu, May 2, 2013 at 10:57 AM, Joshua Marsh <[email protected]> wrote:
> It also benefits from being able to handle much larger files. I haven't use
> mmap() in years, but I'm guessing if you mmap() a 10GB file, you are gonna
> have a bad time. I suppose the C version could be modified to mmap()
> chunks. That's a log of complexity though and we're talking about
> differences of milliseconds in runtime.

Actually, mmap() is often the best way to deal with gigantic files, so
long as you have sufficient address space (not physical memory, but
bus width!) to deal with them.  It has to do with the way virtual
memory is implemented in modern operating systems.  When you tell the
OS to mmap a file, you make that file into what is essentially a
fixed, static swap file that's pre-populated with your data.  When you
ask for an address in the range mmap() returns, it finds the
memory-page-length chunk of file containing that address and loads it
into RAM on-demand.  If you load more than you have free RAM for, it
just writes some memory pages back to disk, or discards them if
they're not dirty.  The kernel has highly-optimized algorithms for
doing this and takes advantage of the CPU's specialized hardware for
it as well, so it's likely to do a better job than you unless you have
some very specific needs that don't map well to the way virtual memory
paging works.

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to