Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Cliff Woolley

On Mon, 24 Jun 2002, Cliff Woolley wrote:

 You can.  The buckets code is smart enough to (a) take no action if the
 apr_file_t is already in an ancestor pool of the one you're asking to
 setaside into and (b) just use apr_file_dup() to get it into the requested
 pool otherwise to handle the pool cleanup/lifetime issues.

One small amendment here:  If it's allowed to, in the (b) case above, it
will try to MMAP the file before it will try to apr_file_dup() it.  I
forget why we made that decision, but I imagine it still has merit.
Anyway, it's still the case that one those things will happen only if the
pool you've requested to setaside into is not an ancestor of the pool the
file is already opened into.  In other words, neither of these things will
happen to apr_file_t's that come out of mod_file_cache.

--Cliff




Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

On Sun, 2002-06-23 at 23:12, Cliff Woolley wrote:
 On Mon, 24 Jun 2002, Bill Stoddard wrote:
 
  Yack... just noticed this too. This renders the fd cache (in
  mod_mem_cache) virtually useless.  Not sure why we cannot setaside a fd.
 
 You can.  The buckets code is smart enough to (a) take no action if the
 apr_file_t is already in an ancestor pool of the one you're asking to
 setaside into and (b) just use apr_file_dup() to get it into the requested
 pool otherwise to handle the pool cleanup/lifetime issues.
 
 It's the core_output_filter that's doing an apr_bucket_read() /
 apr_brigade_write() here.  Presumably to minimize the number of buffers
 that will have to be passed to writev() later.
 
 That could be changed pretty easily, and the mmap/memcpy/munmap/writev
 would go away.  Note, however, that since you can only pass one fd to
 sendfile at a time anyway,

I suppose we can take advantage of sendfilev, which accepts multiple
file descriptors, on platforms where it's available.  However, I'd
rather not setaside file descriptors here anyway, because doing so
would leave us vulnerable to running out of file descriptors in the
multithreaded MPMs.

 delaying the sending of a FILE bucket is pretty
 pointless if you're going to send it out with sendfile later anyway.  What
 would be better is to mmap the file and hang onto the mmap to pass a bunch
 of mmap'ed regions to writev() all at once.  For cache purposes, that just
 means that all you have to do is consider the size of the files you're
 dealing with, and if they're small, use MMapFile instead of CacheFile.  If
 we then got rid of the apr_bucket_read/apr_brigade_write in the
 core_output_filter and just saved up a brigade instead, you'd be set.

I just have one consideration to add here: if we add code to do an mmap,
we need to make sure that it does an open+read instead of mmap if
EnableMMAP off has been set for the directory containing the file.

The more I think about it, though, the more I like the idea of just
writing the brigade out to the client immediately when we see EOS in
core_ouput_filter(), even if c-keepalive is true.  If we do this,
the only bad thing that will happen is that if a client opens a
keepalive connection and sends a stream of requests for 1-byte files,
each file will be sent back in a separate small packet.  But that's
still an improvement over the non-keepalive case (and equivalent to
the packetization that we get from 1.3).

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Bill Stoddard

 The more I think about it, though, the more I like the idea of just
 writing the brigade out to the client immediately when we see EOS in
 core_ouput_filter(), even if c-keepalive is true.  If we do this,
 the only bad thing that will happen is that if a client opens a
 keepalive connection and sends a stream of requests for 1-byte files,
 each file will be sent back in a separate small packet.  But that's
 still an improvement over the non-keepalive case (and equivalent to
 the packetization that we get from 1.3).
 
 --Brian

+1






Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Greg Ames

Brian Pane wrote:

 The more I think about it, though, the more I like the idea of just
 writing the brigade out to the client immediately when we see EOS in
 core_ouput_filter(), even if c-keepalive is true.  If we do this,
 the only bad thing that will happen is that if a client opens a
 keepalive connection and sends a stream of requests for 1-byte files,
 each file will be sent back in a separate small packet.  But that's
 still an improvement over the non-keepalive case (and equivalent to
 the packetization that we get from 1.3).

I wonder if we could also tie the uncork on Linux to the non-keepalive case
without making it too messy?  In other words, if we are doing a whole bunch of
keepalive requests, only uncork after the last one to minimize the number of
packets.  Maybe the uncork could be deferred until just before the socket
shutdown() or close().  

Greg



Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Ryan Bloom wrote:

From: Brian Pane [mailto:[EMAIL PROTECTED]]

Ryan Bloom wrote:



I think we should leave it alone.  This is the difference between
benchmarks and the real world.  How often do people have 8 requests
  

in a
  

row that total less than 8K?

As a compromise, there are two other options.  You could have the
core_output_filter refuse to buffer more than 2 requests, or you
  

could
  

have the core_output_filter not buffer if the full request is in the
buffer.

  

In your second option, do you mean full response rather than full
request?  If so, it's equivalent to what I was proposing yesterday:
send the response for each request as soon as we see EOS for that


request.
  

I like that approach a lot because it keeps us from doing an extra
mmap/memcpy/memunmap write before the write in the real-world case
where a client sends a non-pipelined request for a small (8KB) file
over a keepalive connection.



I do mean full response, and that is NOT equivalent to sending the
response as soon as you see the EOS for that request.  EVERY request
gets its own EOS.  If you send the response when you see the EOS, then
you will have removed all of the buffering for pipelined requests.


-1 on buffering across requests, because the performance problems
caused by the extra mmap+munmap will offset the gain you're trying
to achieve with pipelining.

You are trying to solve a problem that doesn't exist in the real world
IIUIC.  Think it through.  The problem is that if the page is 1k and
you request that page 8 times on the same connection, then Apache will
buffer all of the responses.  How often are those conditions going to
happen in the real world?


That's not the problem that I care about.  The problem that matters
is the one that happens in the real world, as a side-effect of the
core_output_filter() code trying to be too clever:
  - Client opens a keepalive connection to httpd
  - Cclient requests a file smaller than 8KB
  - core_output_filter(), upon seeing the file bucket followed by EOS,
decides to buffer the output because it has less than 8KB total.
  - There isn't another request ready to be read (read returns EAGAIN)
because the client isn't pipelining its connections.  So we then
do the writev of the file that we've just finished buffering.

Aside from slowing things down for the user, this hurts the scalability
of the httpd (mmap and munmap don't scale well on multiprocessor boxes).

What we should be doing in this case is just doing the write immediately
upon seeing the EOS, rather than penalizing both the client and the
server.

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Ryan Bloom wrote:

-1 on buffering across requests, because the performance problems
caused by the extra mmap+munmap will offset the gain you're trying
to achieve with pipelining.



Wait a second.  Now you want to stop buffering to fix a completely
differeny bug.  The idea that we can't keep a file_bucket in the brigade
across requests is only partially true.  Take a closer look at what we
are doing and why when we convert the file to an mmap.  That logic was
added so that we do not leak file descriptors across requests.

However, there are multiple options to fix that. The first, and easiest
one is to just have the core_output_filter call apr_file_close() after
it has sent the file.  The second is to migrate the apr_file_t to the
conn_rec's pool if and only if the file needs to survive the request's
pool being killed.  Because we are only migrating file descriptors in
the edge case, this shouldn't cause a big enough leak to cause a
problem.


Migrating the apr_file_t to the conn_rec's pool is an appealing
solution, but it's quite dangerous.  With that logic added to the
current 8KB threshold, it would be too easy to make an httpd run
out of file descriptors: Send a pipeline of a few hundred requests
for some tiny file (e.g., a small image).  They all get setaside
into the conn_rec's pool.  Then send a request for something
that takes a long time to process, like a CGI.  Run multiple
copies of this against the target httpd at once, and you'll be
able to exhaust the file descriptors for a threaded httpd all
too easily.


  

You are trying to solve a problem that doesn't exist in the real
  

world
  

IIUIC.  Think it through.  The problem is that if the page is 1k and
you request that page 8 times on the same connection, then Apache
  

will
  

buffer all of the responses.  How often are those conditions going to
happen in the real world?

  

That's not the problem that I care about.  The problem that matters
is the one that happens in the real world, as a side-effect of the
core_output_filter() code trying to be too clever:
  - Client opens a keepalive connection to httpd
  - Cclient requests a file smaller than 8KB
  - core_output_filter(), upon seeing the file bucket followed by EOS,
decides to buffer the output because it has less than 8KB total.
  - There isn't another request ready to be read (read returns EAGAIN)
because the client isn't pipelining its connections.  So we then
do the writev of the file that we've just finished buffering.



But, this case only happens if the headers + the file are less than 8k.
If the file is 10k, then this problem doesn't actually exist at all.
As I said above, there are better ways to fix this than removing all
ability to pipeline responses.


We're not removing the ability to pipeline responses.

Aside from slowing things down for the user, this hurts the


scalability
  

of the httpd (mmap and munmap don't scale well on multiprocessor


boxes).
  

What we should be doing in this case is just doing the write


immediately
  

upon seeing the EOS, rather than penalizing both the client and the
server.



By doing that, you are removing ANY benefit to using pipelined requests
when serving files.  Multiple research projects have all found that
pipelined requests show a performance benefit.  In other words, you are
fixing a performance problem by removing another performance enhancer.


Sorry, but that's completely incorrect.  Even if you do the write
for each request's response immediately, you'll still have most
of the benefits of pipelining:
  * Lower total packet count, because the TCP connection
handshaking doesn't need to happen on a per-connection
basis
  * No network round-trip latency for each request (no extra
TCP connection setup or shutdown per request, and the
data packets, even though they're small, can be sent
immediately because we turn off Nagle's algorithm)

--Brian






Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Bill Stoddard wrote:
...

Solve the problem to enable setting aside the open fd just long enough to check for a
pipelined request will nearly completely solve the worst part (the mmap/munmap) of 
this
problem.  On systems with expensive syscalls, we can do browser detection and 
dynamically
determine whether we should attempt the pipelined optimization or not. Not many 
browsers
today support pipelining requests, FWIW.
  


A really simple variant of this would be to handle the keepalive
case by just saving the entire brigade (minus the EOS), the same
way we save the brigade currently if it consists of a bunch of
small buckets from mod_include that total less than 8KB.  In the
non-pipelined scenario, the core_output_filter() would be invoked
again almost immediately (when read returned EAGAIN), and from
there its existing logic would handle the sendfile on the saved
file bucket.

(The only catch, of course, is that the file bucket turns into
an mmap bucket during the ap_save_brigade().  We'd have to change
the file bucket setaside implementation to hand off the fd from
the old pool to the new one, the same way we did with mmap buckets
last year.)

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Brian Pane

Bill Stoddard wrote:
.

So changing the AP_MIN_BYTES_TO_WRITE just moves the relative postion of the write() 
and
the check pipeline read.


It has one other side-effect, though, and that's what's bothering me:
In the case where core_output_filter() decides to buffer a response because
it's smaller than 8KB, the end result is to turn:
sendfile
into:
mmap
memcpy
munmap
... buffer some more requests' output until we have 8KB ...
writev

...

Some potential low hanging fruit here... would it make sense to make the keepalive 
read
(the one right before the select) an APR_INCOMPLETE_READ?


That makes sense.  The select ought to be cheaper than the read
that returns EAGAIN.

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Cliff Woolley

On Mon, 24 Jun 2002, Bill Stoddard wrote:

 Yack... just noticed this too. This renders the fd cache (in
 mod_mem_cache) virtually useless.  Not sure why we cannot setaside a fd.

You can.  The buckets code is smart enough to (a) take no action if the
apr_file_t is already in an ancestor pool of the one you're asking to
setaside into and (b) just use apr_file_dup() to get it into the requested
pool otherwise to handle the pool cleanup/lifetime issues.

It's the core_output_filter that's doing an apr_bucket_read() /
apr_brigade_write() here.  Presumably to minimize the number of buffers
that will have to be passed to writev() later.

That could be changed pretty easily, and the mmap/memcpy/munmap/writev
would go away.  Note, however, that since you can only pass one fd to
sendfile at a time anyway, delaying the sending of a FILE bucket is pretty
pointless if you're going to send it out with sendfile later anyway.  What
would be better is to mmap the file and hang onto the mmap to pass a bunch
of mmap'ed regions to writev() all at once.  For cache purposes, that just
means that all you have to do is consider the size of the files you're
dealing with, and if they're small, use MMapFile instead of CacheFile.  If
we then got rid of the apr_bucket_read/apr_brigade_write in the
core_output_filter and just saved up a brigade instead, you'd be set.

--Cliff