Re: Streams vs. Descriptors

steve Tue, 23 Jul 2002 08:12:25 -0700

On Tue, Jul 16, 2002 at 08:12:22PM -0400, Melvin Smith wrote:
> At 09:42 AM 7/16/2002 -0700, Damien Neil wrote:
> >On Mon, Jul 15, 2002 at 08:59:40PM -0400, Melvin Smith wrote:
> >> True async IO implementations allow other things besides just notifying
> >> the process when data is available. Things like predictive seeks, or
> >> bundling up multiple read/writes, etc. aren't doable with select/poll 
> >loops.
> >> And the aioread/aiowrite/listio, etc. are a POSIX standard now, so they
> >> should be reasonably available on most UNIXen.
> >
> >I'm not familiar with "predictive seeks", and a quick google didn't
> >turn up anything relevant; can you give a quick explanation?
> 
> Doing some research, it doesn't look like there is any such support
> for this type of thing with POSIX api. I was trying to stress that
> real async IO could do seeks/writes/reads in parallel to processing,
> and I thought you were confusing a callback/dispatch loop with this.
> 
> Now it appears you weren't confusing anything. :)
> 
> >Bundling reads and writes sounds like a job for a buffered I/O layer.
> 
> You are probably right for most cases. I do know there are softwares that
> don't use a standard buffered layer, but still do large writes, either 
> sequential
> or random. The former being multimedia apps, the latter being relational DB
> engines like Oracle. There is still advantage to providing a non-buffered,
> scatter/gather interface for less system call overhead, right?


Under the very large assumption that I understand what is meant here
by the aio stuff, I would say that it's useful for more than has been
described. As I understand it, it's really only useful for very
performance-critical applications, ones that need to schedule even
disk accesses for efficiency. aio allows you to inform the OS that you
don't care about the ordering of a batch of operations, and thereby
allow the OS or even the disk controller to reorder and schedule
things as it likes. Think of a fairly common application: looking up a
set of hash values in a disk-based hashtable. If you use the usual
synchronous read/write calls, you force the ordering of the accesses,
possibly paying for many more seeks than you need. Also, after the
data has been read off of the disk, control has to pass to user land
(probably incurring a buffer copy or two), a small amount of logic has
to run, the next request jumps back to kernel land, the kernel does
some buffer management tapdancing, and only then does the disk get to
work on your request. With aio, the kernel can go through as much of
the work list as it likes before prodding you to wake up and do
something with the stuff it fetched. These differences are magnified
when you are reading off of multiple physical disks -- using
synchronous calls, you're very likely to end up with only one active
disk at a time, with the rest unnecessarily killing time waiting for
the next request that the application already knows is coming, but has
no way of telling the disks.

These advantages require an actual change in the user/kernel
communication, so cannot be done in a buffered I/O layer. They *can*
be done with threads, but the overhead of threads may cut pretty
heavily into your efficiency gains.

On the other hand, I may have completely missed the point. :-)

Re: Streams vs. Descriptors

Reply via email to