On 03/22/2011 07:26 AM, Hrvoje Niksic wrote: > Sebastian Pipping <[email protected]> writes: > >> I noticed that wget writes data to disk as it comes in. > > This is not strictly true, it is up to the OS to write data to disk. > What Wget does is that it doesn't hold the data in stdio buffers after > receiving it from the network. Since the data comes from the network in > buffers, this is exactly what you want. It would be a bad idea to > interrupt Wget only to find that some data is missing because Wget was > unnecessarily buffering it.
Well, but this problem is the same one suffered by an program anywhere that uses stdio. And if wget was interrupted, I don't think the user ought to expect any particular state of data for files that were being written to at the time of interruption. But your mention of stdio brings an important point: if folks want to buffer the data to "page size" chunks before writing, they are much better off just using stdio to do the buffering. Because the system library has the best chance of finding an optimum way to write out the data in a way that's efficient for the system. Of course, your point that data will typically arrive already in buffers is also salient; though these buffers might not turn out to be the right size for solid performance locally, particularly if the server-side scripts do a lot of flushing or something (for dynamically generated pages). I definitely think we should see some comparisons between the proposed changes and the current code before we decide that it's a good idea. With real-world, typical cases. >> I was wondering if you would be interested to incorporate a patch >> buffering writes to full page caches sizes (e.g. 4096 in my machine) >> by default and adding a parameter to override this behavior. > > Can you describe a specific problem that this additional parameter would > address? Do you mean, if he implemented buffering, would disabling it be useful? I would think so. I could imagine scenarios where one would wish to tail -f data as it came in. Or there's your explanation above of someone not wanting to lose extra data because it had been buffered. But I'm guessing what you really wanted to know was whether the feature as a whole had a specific problem to address. -- Micah J. Cowan http://micah.cowan.name/
