Re: randomIO, std.file, core.stdc.stdio

Charles Hixson via Digitalmars-d-learn Tue, 26 Jul 2016 12:32:15 -0700

On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:

On 7/26/16 1:57 PM, Charles Hixson via Digitalmars-d-learn wrote:
Thanks.  Since there isn't any excess overhead I guess I'll use stdio.
Buffering, however, isn't going to help at all since I'm doing
randomIO.  I know that most of the data the system reads from disk is
going to end up getting thrown away, since my records will generally be
smaller than 8K, but there's no help for that.
Even for doing random I/O buffering is helpful. It depends on the sizeof your items.
Essentially, to read 10 bytes from a file probably costs the same asreading 100,000 bytes from a file. So may as well buffer that in caseyou need it.
Now, C i/o's buffering may not suit your exact needs. So I don't knowhow it will perform. You may want to consider mmap which tells thekernel to link pages of memory directly to disk access. Then thekernel is doing all the buffering for you. Phobos has support for it,but it's pretty minimal from what I can see:http://dlang.org/phobos/std_mmfile.html
-Steve

I've considered mmapfile often, but when I read the documentation I endup realizing that I don't understand it. So I look up memory mappedfiles in other places, and I still don't understand it. It looks as ifthe entire file is stored in memory, which is not at all what I want,but I also can't really believe that's what's going on. I know thatthere was an early form of this in a version of BASIC (the version thatRISS was written in, but I don't remember which version that was) and in*that* version array elements were read in as needed. (It wasn'tspectacularly efficient.) But memory mapped files don't seem to workthat way, because people keep talking about how efficient they are. Doyou know a good introductory tutorial? I'm guessing that "window size"might refer to the number of bytes available, but what if you need toappend to the file? Etc.

A part of the problem is that I don't want this to be a process with anarbitrarily high memory use. Buffering would be fine, if I could useit, but for my purposes sequential access is likely to be rare, and theworking layout of the data in RAM doesn't (can't reasonably) match thelayout on disk. IIUC (this is a few decades old) the system buffer sizeis about 8K. I expect to never need to read that large a chunk, but I'mgoing to try to keep the chunks in multiples of 1024 bytes, and if it'sreasonable to exactly 1024 bytes. So I should never need two reads orwrites for a chunk. I guess to be sure of this I'd better make sure thefile header is also 1024 bytes. (I'm guessing that the seek to positionresults in the appropriate buffer being read into the system buffer, soif my header were 512 bytes I might occasionally need to do double readsor writes.)

I'm guessing that memory mapped files trade off memory use against speedof access, and for my purposes that's probably a bad trade, even thoughdatabases are doing that more and more. I'm likely to need all thememory I can lay my hands on, and even then thrashing wouldn't surpriseme. So a fixed buffer size seems a huge advantage.

Re: randomIO, std.file, core.stdc.stdio

Reply via email to