On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote:
On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote:
[...]
FWIW
i have to deal with big data files that can be a few GB. for
some data
analysis software i wrote in C a while back i did some testing
with
caching and such. turns out that for Win7-64 the automatic
caching
done by the OS is really good and any attempt to speed things
up
actually slowed it down. no kidding, i have seen more than 2GB
of data
being automatically cached. of course the system RAM must be
larger
than the file size (if i remember my tests correctly by a
factor of
~2, but this is maybe not a linear relationship, i did not
actually
change the RAM just the size of the data file) and it will
hold it in
the cache only as long as there are no concurrent applications
requiring RAM or caching. i guess my point is, if your target
is Win7
and your files are >5x smaller than the installed RAM i would
not
bother at all trying to optimize file access. i suppose -nix
machine
will do a similar good job these days.
[...]
IIRC, Linux has been caching files (or disk blocks, rather) in
memory
since the days of Win95. Of course, memory in those days was
much
scarcer, but file sizes were smaller too. :) There's still a
cost to
copy the kernel buffers into userspace, though, which should
not be
disregarded. But if you use mmap, then you're essentially
accessing that
memory cache directly, which is as good as it gets.
I don't know how well mmap works on windows, though, IIRC it
doesn't
have the same semantics as Posix, so you could accidentally run
into
performance issues by using it the wrong way on windows.
T
I did some benching a while back with user bioinfornatics. He had
to do some pretty large file reads, preferably in very little
time. Observations showed my algo was *much* faster under windows
then linux.
What we observed is that under windows, as soon as you open a
file for reading, windows starts buffering the file in a parallel
thread.
What we did was create two threads. The first did nothing but
read the file, store it into chunks of memory, and then pass it
to a worker thread. The worker thread did the parsing proper.
Doing this *halved* the linux runtime, tying it with the
"monothreaded" windows run time. Windows saw no change.
FYI, the full thread is here:
forum.dlang.org/thread/gmfqwzgtjfnqiajgh...@forum.dlang.org