Robert Bossy wrote: > Bryan Olson wrote: >> Robert Bossy wrote: >>>> Robert Bossy wrote: >>>>> Indeed! Maybe the best choice for chunksize would be the file's buffer >>>>> size... >> >> That bit strikes me as silly. >> > The size of the chunk must be as little as possible in order to minimize > memory consumption. However below the buffer-size, you'll end up filling > the buffer anyway before actually writing on disk.
First, which buffer? The file library's buffer is of trivial size, a few KB, and if we wanted to save even that we'd use os.open and have no such buffer at all. The OS may set up a file-specific buffer, but again those are small, and we could fill our file much faster with larger writes. Kernel buffers/pages are dynamically assigned on modern operating systems. There is no particular buffer size for the file if you mean the amount of kernel memory holding the written data. Some OS's do not buffer writes to disk files; the write doesn't return until the data goes to disk (though they may cache it for future reads). To fill the file fast, there's a large range of reasonable sizes for writing, but user-space buffer size - typically around 4K - is too small. 1 GB is often disastrously large, forcing paging to and from disk to access the memory. In this thread, Matt Nordhoff used 10MB; fine size today, and probably for several years to come. If the OP is writing to a remote disk file to test network throughput, there's another size limit to consider. Network file- system protocols do not steam very large writes; the client has to break a large write into several smaller writes. NFS version 2 had a limit of 8 KB; version 3 removed the limit by allowing the server to tell the client the largest size it supports. (Version 4 is now out, in hundreds of pages of RFC that I hope to avoid reading.) -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list