My company is building an embedded system using Linux. For the purpose of this discussion, I'll highlight that the application consists of three main components. The first receives data on two file streams at a rate of 5-10mbit/sec each, and that the data can be viewed conceptually as if it were large tar files that have to be 'untarred' into the individual disk files on the fly. It does this 24 hours a day 7 days a week, in theory. The structure of the stream isn't actually tar files and one of the features is that when we receive the header indicating the start of an incoming file we know at that point how big the file is going to grow to. The second accesses the data, in effect as the data storage for a caching web server. It is on-demand and in use when the user wants it. The third deletes old files. It is, in effect, a storage garbage collector. Each incoming file is tagged with an 'expiration date' and files are deleted when they have past their expiration date and the space is needed. The box has to be inexpensive, which means a single 'consumer' disk drive on an ATA controller. While we are not consuming significant bandwidth, but the overhead of seeking on the disk is sufficient that we are effectively limited by seeking. I am, at this point, contemplating how to take advantage of the knowledge of the file size, since performance will degrade over time as the storage becomes fragmented by repeated garbage collection. I recognize that this application has a fairly unique set of requirements and optimizations I make to the file system may very well be inappropriate for other uses. On the other hand, I feel strongly that the sort of changes I have in mind should be available for others who have similar problems, and there are maintenance benefits to having the optimizations incorporated in the main line tree. This said, the discussion I'd like to have amounts to "how do I maximize my chances of making the changes in a way that have wide value and that at least some of them end up in the tree?" My inclination is to add another system-call interface, say for the purpose of discussions, preallocate(), which takes the vmadvise-ish role of providing a file system with information about the final size of a file that the file system is perfectly willing to ignore. I'd like to see this call become part of the VFS layer, with the default semantic for existing file systems to be to return -1 with errno set to ENOSYS. I believe that this part would be a reasonable addition, and would, of course, make the changes in the VFS to support it and contribute a patch for review and comment. Having the syscall, I would then make modifications to EXT2 to take advantage of the knowledge. EXT2 does a good job of avoiding fragmentation now, but I believe there are approaches that would allow it to do a better job in this special case without making it any more complex and without degrading performance when the feature is not used. (This based on experience with other file systems some time ago.) There are other advantages to our application of having the storage allocated at initialization time, as it will simplify maintaining the high-water/low-water mark processing necessary for the garbage collector. Anyway, my first step is to ask for comments on the advisability of adding a call to the VFS layer as outlined above. I would probably flesh it out as Int fpreallocate(int fd, off_t length) Inform the file system that the file open on fd but not yet written to will be length bytes in size. This is an advisory call to the file system and may be ignored. A file system that ignores the call will return -1 and set errno to ENOSYS. A file system that implements the call may validate the fd and length parameter and return -1 and an appropriate errno value if length is invalid. A filesystem that implements the call will return 0 and use the length information in an implementation specific way. Possible error returns include EBADF - there is no file open with the descriptor fd ENOSPC - there is not enough space on the device for the file EACCESS - the caller does not have permission EINVAL - length is less than 1 Comments? Suggestions? Flames? Thanks, Marty - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
