On Thu, May 08, 2008 at 10:33:21AM -0400, Nick Mathewson wrote:
> On Wed, May 07, 2008 at 11:47:09PM -0700, Niels Provos wrote:
> > Hi Manual,
> > 
> > this is a good suggestion.   Nick and I are currently working on how
> > buffers and http work in libevent 2.0.  You might want to check out
> > trunk to see some of the progress there.   In any case, it seems that
> > your sendfile changes would be a good fit there.  BTW, sendfile is
> > available on a large number of platforms now.
> 
> In fact, this fits pretty well into the newer evbuffer implementation.
> Whereas the old implementation used a big chunk of memory for the
> entire buffer, the new code uses a linked list of small chunks. (This
> removes the need for big memmov operations, and generally makes
> writing big files more efficient.)
> 
> All we'd need to do to support sendfile is add another chunk type
> whose contents were a file rather than a chunk of ram.  We'd probably
> want at least two ways to "add a file at this point in the stream":
> one taking a filename and one taking an fd.  Of course, we'd need to
> make sure to fall back on mmap() for systems lacking sendfile(), and
> maybe on a series of read() operations on systems lacking mmap().

On Linux, I'd bet that AIO+splice can beat sendfile. The problem with
sendfile is that it can still block on file I/O. This is one major reason
why sendfile is no good for multimedia servers, _unless_ you're willing to
juggle and manage multiple threads (w/ the headache of profiling access
patterns and adjusting number of threads). On Linux, though, you could have
one thread doing kernel AIO from a file into a pipe using vmsplice, and a
second thread running the event loop (or N+M threads, where N is your number
of cores--an easier calculation.)

This leads into the second issue, which is how best to manage vmsplice
buffers, which needs to be page aligned, and have certain ownership
restrictions.

The upside is that it's the only feasible way in any unix (that I'm aware
of) to actually get a page of file data into a buffer without worrying about
(a) blocking, (b) dealing with POSIX AIO readiness signalling, or (c)
copying the data. _Also_, this allows a potential single code pathway for
either encrypted or plaintext I/O; for encrypted you'd just insert a filter,
and write the new data to a new page buffer. (Given all the recent political
"issues" lately, it's disconcerting that more people are giving thought to
moving the net toward ubiquitous encryption; including reducing the overhead
so there are few excuses.)

One caveat w/ the non-blocking splice setup is how to handle seeking. But
that can be dealt with as a matter of course.

Anyhow, next month I'll have some code and hopefully some numbers.

_______________________________________________
Libevent-users mailing list
Libevent-users@monkey.org
http://monkeymail.org/mailman/listinfo/libevent-users

Reply via email to