On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote:
On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote:
[...]
FWIW
i have to deal with big data files that can be a few GB. for some data analysis software i wrote in C a while back i did some testing with caching and such. turns out that for Win7-64 the automatic caching done by the OS is really good and any attempt to speed things up actually slowed it down. no kidding, i have seen more than 2GB of data being automatically cached. of course the system RAM must be larger than the file size (if i remember my tests correctly by a factor of ~2, but this is maybe not a linear relationship, i did not actually change the RAM just the size of the data file) and it will hold it in
the cache only as long as there are no concurrent applications
requiring RAM or caching. i guess my point is, if your target is Win7 and your files are >5x smaller than the installed RAM i would not bother at all trying to optimize file access. i suppose -nix machine
will do a similar good job these days.
[...]

IIRC, Linux has been caching files (or disk blocks, rather) in memory since the days of Win95. Of course, memory in those days was much scarcer, but file sizes were smaller too. :) There's still a cost to copy the kernel buffers into userspace, though, which should not be disregarded. But if you use mmap, then you're essentially accessing that
memory cache directly, which is as good as it gets.

I don't know how well mmap works on windows, though, IIRC it doesn't have the same semantics as Posix, so you could accidentally run into
performance issues by using it the wrong way on windows.


T

I did some benching a while back with user bioinfornatics. He had to do some pretty large file reads, preferably in very little time. Observations showed my algo was *much* faster under windows then linux.

What we observed is that under windows, as soon as you open a file for reading, windows starts buffering the file in a parallel thread.

What we did was create two threads. The first did nothing but read the file, store it into chunks of memory, and then pass it to a worker thread. The worker thread did the parsing proper.

Doing this *halved* the linux runtime, tying it with the "monothreaded" windows run time. Windows saw no change.

FYI, the full thread is here:
forum.dlang.org/thread/gmfqwzgtjfnqiajgh...@forum.dlang.org

Reply via email to