On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:
On 7/26/16 1:57 PM, Charles Hixson via Digitalmars-d-learn wrote:

Thanks.  Since there isn't any excess overhead I guess I'll use stdio.
Buffering, however, isn't going to help at all since I'm doing
randomIO.  I know that most of the data the system reads from disk is
going to end up getting thrown away, since my records will generally be
smaller than 8K, but there's no help for that.


Even for doing random I/O buffering is helpful. It depends on the size of your items.

Essentially, to read 10 bytes from a file probably costs the same as reading 100,000 bytes from a file. So may as well buffer that in case you need it.

Now, C i/o's buffering may not suit your exact needs. So I don't know how it will perform. You may want to consider mmap which tells the kernel to link pages of memory directly to disk access. Then the kernel is doing all the buffering for you. Phobos has support for it, but it's pretty minimal from what I can see: http://dlang.org/phobos/std_mmfile.html

-Steve
I've considered mmapfile often, but when I read the documentation I end up realizing that I don't understand it. So I look up memory mapped files in other places, and I still don't understand it. It looks as if the entire file is stored in memory, which is not at all what I want, but I also can't really believe that's what's going on. I know that there was an early form of this in a version of BASIC (the version that RISS was written in, but I don't remember which version that was) and in *that* version array elements were read in as needed. (It wasn't spectacularly efficient.) But memory mapped files don't seem to work that way, because people keep talking about how efficient they are. Do you know a good introductory tutorial? I'm guessing that "window size" might refer to the number of bytes available, but what if you need to append to the file? Etc.

A part of the problem is that I don't want this to be a process with an arbitrarily high memory use. Buffering would be fine, if I could use it, but for my purposes sequential access is likely to be rare, and the working layout of the data in RAM doesn't (can't reasonably) match the layout on disk. IIUC (this is a few decades old) the system buffer size is about 8K. I expect to never need to read that large a chunk, but I'm going to try to keep the chunks in multiples of 1024 bytes, and if it's reasonable to exactly 1024 bytes. So I should never need two reads or writes for a chunk. I guess to be sure of this I'd better make sure the file header is also 1024 bytes. (I'm guessing that the seek to position results in the appropriate buffer being read into the system buffer, so if my header were 512 bytes I might occasionally need to do double reads or writes.)

I'm guessing that memory mapped files trade off memory use against speed of access, and for my purposes that's probably a bad trade, even though databases are doing that more and more. I'm likely to need all the memory I can lay my hands on, and even then thrashing wouldn't surprise me. So a fixed buffer size seems a huge advantage.

Reply via email to