My company is building an embedded system using Linux.  For the purpose of
this discussion, I'll highlight that the application consists of three main
components.
The first receives data on two file streams at a rate of 5-10mbit/sec each,
and that the data can be viewed conceptually as if it were large tar files
that have to be 'untarred' into the individual disk files on the fly.  It
does this 24 hours a day 7 days a week, in theory. The structure of the
stream isn't actually tar files and one of the features is that when we
receive the header indicating the start of an incoming file we know at that
point how big the file is going to grow to.
The second accesses the data, in effect as the data storage for a caching
web server.  It is on-demand and in use when the user wants it.
The third deletes old files. It is, in effect, a storage garbage collector.
Each incoming file is tagged with an 'expiration date' and files are deleted
when they have past their expiration date and the space is needed.
The box has to be inexpensive, which means a single 'consumer' disk drive on
an ATA controller.  While we are not consuming significant bandwidth, but
the overhead of seeking on the disk is sufficient that we are effectively
limited by seeking.
I am, at this point, contemplating how to take advantage of the knowledge of
the file size, since performance will degrade over time as the storage
becomes fragmented by repeated garbage collection.  I recognize that this
application has a fairly unique set of requirements and optimizations I make
to the file system may very well be inappropriate for other uses.  On the
other hand, I feel strongly that the sort of changes I have in mind should
be available for others who have similar problems, and there are maintenance
benefits to having the optimizations incorporated in the main line tree.
This said, the discussion I'd like to have amounts to "how do I maximize my
chances of making the changes in a way that have wide value and that at
least some of them end up in the tree?"
My inclination is to add another system-call interface, say for the purpose
of discussions, preallocate(), which takes the vmadvise-ish role of
providing a file system with information about the final size of a file that
the file system is perfectly willing to ignore.  I'd like to see this call
become part of the VFS layer, with the default semantic for existing file
systems to be to return -1 with errno set to ENOSYS.  I believe that this
part would be a reasonable addition, and would, of course, make the changes
in the VFS to support it and contribute a patch for review and comment.
Having the syscall, I would then make modifications to EXT2 to take
advantage of the knowledge.  EXT2 does a good job of avoiding fragmentation
now, but I believe there are approaches that would allow it to do a better
job in this special case without making it any more complex and without
degrading performance when the feature is not used. (This based on
experience with other file systems some time ago.)  There are other
advantages to our application of having the storage allocated at
initialization time, as it will simplify maintaining the
high-water/low-water mark processing necessary for the garbage collector.
Anyway, my first step is to ask for comments on the advisability of adding a
call to the VFS layer as outlined above.  I would probably flesh it out as
Int fpreallocate(int fd, off_t length)
  Inform the file system that the file open on fd but not yet written to
will be length bytes in size. This is an advisory call to the file system
and may be ignored. A file system that ignores the call will return -1 and
set errno to ENOSYS.  A file system that implements the call may validate
the fd and length parameter and return -1 and an appropriate errno value if
length is invalid. A filesystem that implements the call will return 0 and
use the length information in an implementation specific way. Possible error
returns include
EBADF - there is no file open with the descriptor fd
ENOSPC - there is not enough space on the device for the file
EACCESS - the caller does not have permission
EINVAL - length is less than 1
Comments? Suggestions? Flames?

Thanks,

Marty
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Reply via email to