Re: Reading a structured binary file?

monarch_dodra Sat, 03 Aug 2013 14:30:36 -0700

On Friday, 2 August 2013 at 23:51:27 UTC, H. S. Teoh wrote:

On Fri, Aug 02, 2013 at 06:38:20PM -0500, captaindet wrote:
[...]
FWIW
i have to deal with big data files that can be a few GB. forsome dataanalysis software i wrote in C a while back i did some testingwithcaching and such. turns out that for Win7-64 the automaticcachingdone by the OS is really good and any attempt to speed thingsupactually slowed it down. no kidding, i have seen more than 2GBof databeing automatically cached. of course the system RAM must belargerthan the file size (if i remember my tests correctly by afactor of~2, but this is maybe not a linear relationship, i did notactuallychange the RAM just the size of the data file) and it willhold it in
the cache only as long as there are no concurrent applications
requiring RAM or caching. i guess my point is, if your targetis Win7and your files are >5x smaller than the installed RAM i wouldnotbother at all trying to optimize file access. i suppose -nixmachine
will do a similar good job these days.
[...]
IIRC, Linux has been caching files (or disk blocks, rather) inmemorysince the days of Win95. Of course, memory in those days wasmuchscarcer, but file sizes were smaller too. :) There's still acost tocopy the kernel buffers into userspace, though, which shouldnot bedisregarded. But if you use mmap, then you're essentiallyaccessing that
memory cache directly, which is as good as it gets.
I don't know how well mmap works on windows, though, IIRC itdoesn'thave the same semantics as Posix, so you could accidentally runinto
performance issues by using it the wrong way on windows.


T

I did some benching a while back with user bioinfornatics. He hadto do some pretty large file reads, preferably in very littletime. Observations showed my algo was *much* faster under windowsthen linux.

What we observed is that under windows, as soon as you open afile for reading, windows starts buffering the file in a parallelthread.

What we did was create two threads. The first did nothing butread the file, store it into chunks of memory, and then pass itto a worker thread. The worker thread did the parsing proper.

Doing this *halved* the linux runtime, tying it with the"monothreaded" windows run time. Windows saw no change.


FYI, the full thread is here:
forum.dlang.org/thread/[email protected]

Re: Reading a structured binary file?

Reply via email to