On Thu, May 2, 2013 at 10:57 AM, Joshua Marsh <[email protected]> wrote: > It also benefits from being able to handle much larger files. I haven't use > mmap() in years, but I'm guessing if you mmap() a 10GB file, you are gonna > have a bad time. I suppose the C version could be modified to mmap() > chunks. That's a log of complexity though and we're talking about > differences of milliseconds in runtime.
Actually, mmap() is often the best way to deal with gigantic files, so long as you have sufficient address space (not physical memory, but bus width!) to deal with them. It has to do with the way virtual memory is implemented in modern operating systems. When you tell the OS to mmap a file, you make that file into what is essentially a fixed, static swap file that's pre-populated with your data. When you ask for an address in the range mmap() returns, it finds the memory-page-length chunk of file containing that address and loads it into RAM on-demand. If you load more than you have free RAM for, it just writes some memory pages back to disk, or discards them if they're not dirty. The kernel has highly-optimized algorithms for doing this and takes advantage of the CPU's specialized hardware for it as well, so it's likely to do a better job than you unless you have some very specific needs that don't map well to the way virtual memory paging works. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
