Most file systems try hard to keep files contiguous, but this can be very difficult when they don't know how big the file will be. The result is a lot of disk fragmentation, especially when lots of files are being written at the same time or when the file system gets full.
A few years ago, Linux added a new system call, fallocate(), for optional use by any application that knows in advance how big a file will be. fallocate() is implemented at the file system level, and not every Linux file system supports it. Ext3 does not, but several other important ones do, including xfs and ext4. It's especially useful with XFS and other extent-based file systems. Calling fallocate() on a file system that doesn't support it simply returns -1 with no harm done. And nothing keeps you from writing past the region specified by fallocate() although the extra space may not be contiguous with the previously reserved space. It seems to me that file de-archivers like unzip, tar, cpio and pax, as well as remote file copy programs like rsync, really ought to invoke fallocate() when available. A successful fallocate() call also guarantees that writes within the allocated area won't fail. This seems like another very useful feature in that a file extraction utility could skip large files for which there isn't any room instead of ungracefully running the file system out of space and then deleting the partial copy. I suppose there's room for debate as to whether the program ought to simply quit when fallocate() fails or skip the file and continue to extract smaller files that do fit. And if you really want to go all-out, if you know how much space will be needed by all of the files you're extracting you could provide the option of allocating it all in advance and and aborting without reading anything if there isn't room for it all. If I come up with some patches to put fallocate() into cpio, would the upstream maintainers be willing to adopt them? Thanks, Phil Karn