Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Bill Stoddard wrote:
...

>Solve the problem to enable setting aside the open fd just long enough to check for a
>pipelined request will nearly completely solve the worst part (the mmap/munmap) of 
>this
>problem.  On systems with expensive syscalls, we can do browser detection and 
>dynamically
>determine whether we should attempt the pipelined optimization or not. Not many 
>browsers
>today support pipelining requests, FWIW.
>  
>

A really simple variant of this would be to handle the keepalive
case by just saving the entire brigade (minus the EOS), the same
way we save the brigade currently if it consists of a bunch of
small buckets from mod_include that total less than 8KB.  In the
non-pipelined scenario, the core_output_filter() would be invoked
again almost immediately (when read returned EAGAIN), and from
there its existing logic would handle the sendfile on the saved
file bucket.

(The only catch, of course, is that the file bucket turns into
an mmap bucket during the ap_save_brigade().  We'd have to change
the file bucket setaside implementation to hand off the fd from
the old pool to the new one, the same way we did with mmap buckets
last year.)

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Ryan Bloom wrote:

>>-1 on buffering across requests, because the performance problems
>>caused by the extra mmap+munmap will offset the gain you're trying
>>to achieve with pipelining.
>>
>>
>
>Wait a second.  Now you want to stop buffering to fix a completely
>differeny bug.  The idea that we can't keep a file_bucket in the brigade
>across requests is only partially true.  Take a closer look at what we
>are doing and why when we convert the file to an mmap.  That logic was
>added so that we do not leak file descriptors across requests.
>
>However, there are multiple options to fix that. The first, and easiest
>one is to just have the core_output_filter call apr_file_close() after
>it has sent the file.  The second is to migrate the apr_file_t to the
>conn_rec's pool if and only if the file needs to survive the request's
>pool being killed.  Because we are only migrating file descriptors in
>the edge case, this shouldn't cause a big enough leak to cause a
>problem.
>

Migrating the apr_file_t to the conn_rec's pool is an appealing
solution, but it's quite dangerous.  With that logic added to the
current 8KB threshold, it would be too easy to make an httpd run
out of file descriptors: Send a pipeline of a few hundred requests
for some tiny file (e.g., a small image).  They all get setaside
into the conn_rec's pool.  Then send a request for something
that takes a long time to process, like a CGI.  Run multiple
copies of this against the target httpd at once, and you'll be
able to exhaust the file descriptors for a threaded httpd all
too easily.

>
>  
>
>>>You are trying to solve a problem that doesn't exist in the real
>>>  
>>>
>world
>  
>
>>>IIUIC.  Think it through.  The problem is that if the page is <1k and
>>>you request that page 8 times on the same connection, then Apache
>>>  
>>>
>will
>  
>
>>>buffer all of the responses.  How often are those conditions going to
>>>happen in the real world?
>>>
>>>  
>>>
>>That's not the problem that I care about.  The problem that matters
>>is the one that happens in the real world, as a side-effect of the
>>core_output_filter() code trying to be too clever:
>>  - Client opens a keepalive connection to httpd
>>  - Cclient requests a file smaller than 8KB
>>  - core_output_filter(), upon seeing the file bucket followed by EOS,
>>decides to buffer the output because it has less than 8KB total.
>>  - There isn't another request ready to be read (read returns EAGAIN)
>>because the client isn't pipelining its connections.  So we then
>>do the writev of the file that we've just finished buffering.
>>
>>
>
>But, this case only happens if the headers + the file are less than 8k.
>If the file is >10k, then this problem doesn't actually exist at all.
>As I said above, there are better ways to fix this than removing all
>ability to pipeline responses.
>

We're not removing the ability to pipeline responses.

>>Aside from slowing things down for the user, this hurts the
>>
>>
>scalability
>  
>
>>of the httpd (mmap and munmap don't scale well on multiprocessor
>>
>>
>boxes).
>  
>
>>What we should be doing in this case is just doing the write
>>
>>
>immediately
>  
>
>>upon seeing the EOS, rather than penalizing both the client and the
>>server.
>>
>>
>
>By doing that, you are removing ANY benefit to using pipelined requests
>when serving files.  Multiple research projects have all found that
>pipelined requests show a performance benefit.  In other words, you are
>fixing a performance problem by removing another performance enhancer.
>

Sorry, but that's completely incorrect.  Even if you do the write
for each request's response immediately, you'll still have most
of the benefits of pipelining:
  * Lower total packet count, because the TCP connection
handshaking doesn't need to happen on a per-connection
basis
  * No network round-trip latency for each request (no extra
TCP connection setup or shutdown per request, and the
data packets, even though they're small, can be sent
immediately because we turn off Nagle's algorithm)

--Brian






Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Ryan Bloom wrote:

>>From: Brian Pane [mailto:[EMAIL PROTECTED]]
>>
>>Ryan Bloom wrote:
>>
>>
>>
>>>I think we should leave it alone.  This is the difference between
>>>benchmarks and the real world.  How often do people have 8 requests
>>>  
>>>
>in a
>  
>
>>>row that total less than 8K?
>>>
>>>As a compromise, there are two other options.  You could have the
>>>core_output_filter refuse to buffer more than 2 requests, or you
>>>  
>>>
>could
>  
>
>>>have the core_output_filter not buffer if the full request is in the
>>>buffer.
>>>
>>>  
>>>
>>In your second option, do you mean "full response" rather than "full
>>request"?  If so, it's equivalent to what I was proposing yesterday:
>>send the response for each request as soon as we see EOS for that
>>
>>
>request.
>  
>
>>I like that approach a lot because it keeps us from doing an extra
>>mmap/memcpy/memunmap write before the write in the real-world case
>>where a client sends a non-pipelined request for a small (<8KB) file
>>over a keepalive connection.
>>
>>
>
>I do mean full response, and that is NOT equivalent to sending the
>response as soon as you see the EOS for that request.  EVERY request
>gets its own EOS.  If you send the response when you see the EOS, then
>you will have removed all of the buffering for pipelined requests.
>

-1 on buffering across requests, because the performance problems
caused by the extra mmap+munmap will offset the gain you're trying
to achieve with pipelining.

>You are trying to solve a problem that doesn't exist in the real world
>IIUIC.  Think it through.  The problem is that if the page is <1k and
>you request that page 8 times on the same connection, then Apache will
>buffer all of the responses.  How often are those conditions going to
>happen in the real world?
>

That's not the problem that I care about.  The problem that matters
is the one that happens in the real world, as a side-effect of the
core_output_filter() code trying to be too clever:
  - Client opens a keepalive connection to httpd
  - Cclient requests a file smaller than 8KB
  - core_output_filter(), upon seeing the file bucket followed by EOS,
decides to buffer the output because it has less than 8KB total.
  - There isn't another request ready to be read (read returns EAGAIN)
because the client isn't pipelining its connections.  So we then
do the writev of the file that we've just finished buffering.

Aside from slowing things down for the user, this hurts the scalability
of the httpd (mmap and munmap don't scale well on multiprocessor boxes).

What we should be doing in this case is just doing the write immediately
upon seeing the EOS, rather than penalizing both the client and the
server.

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Greg Ames

Brian Pane wrote:

> The more I think about it, though, the more I like the idea of just
> writing the brigade out to the client immediately when we see EOS in
> core_ouput_filter(), even if c->keepalive is true.  If we do this,
> the only bad thing that will happen is that if a client opens a
> keepalive connection and sends a stream of requests for 1-byte files,
> each file will be sent back in a separate small packet.  But that's
> still an improvement over the non-keepalive case (and equivalent to
> the packetization that we get from 1.3).

I wonder if we could also tie the uncork on Linux to the non-keepalive case
without making it too messy?  In other words, if we are doing a whole bunch of
keepalive requests, only uncork after the last one to minimize the number of
packets.  Maybe the uncork could be deferred until just before the socket
shutdown() or close().  

Greg



Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Brian Pane

Ryan Bloom wrote:

>I think we should leave it alone.  This is the difference between
>benchmarks and the real world.  How often do people have 8 requests in a
>row that total less than 8K?
>
>As a compromise, there are two other options.  You could have the
>core_output_filter refuse to buffer more than 2 requests, or you could
>have the core_output_filter not buffer if the full request is in the
>buffer.
>

In your second option, do you mean "full response" rather than "full
request"?  If so, it's equivalent to what I was proposing yesterday:
send the response for each request as soon as we see EOS for that request.
I like that approach a lot because it keeps us from doing an extra
mmap/memcpy/memunmap write before the write in the real-world case
where a client sends a non-pipelined request for a small (<8KB) file
over a keepalive connection.

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-24 Thread Bill Stoddard

> The more I think about it, though, the more I like the idea of just
> writing the brigade out to the client immediately when we see EOS in
> core_ouput_filter(), even if c->keepalive is true.  If we do this,
> the only bad thing that will happen is that if a client opens a
> keepalive connection and sends a stream of requests for 1-byte files,
> each file will be sent back in a separate small packet.  But that's
> still an improvement over the non-keepalive case (and equivalent to
> the packetization that we get from 1.3).
> 
> --Brian

+1






Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Brian Pane

On Sun, 2002-06-23 at 23:12, Cliff Woolley wrote:
> On Mon, 24 Jun 2002, Bill Stoddard wrote:
> 
> > Yack... just noticed this too. This renders the fd cache (in
> > mod_mem_cache) virtually useless.  Not sure why we cannot setaside a fd.
> 
> You can.  The buckets code is smart enough to (a) take no action if the
> apr_file_t is already in an ancestor pool of the one you're asking to
> setaside into and (b) just use apr_file_dup() to get it into the requested
> pool otherwise to handle the pool cleanup/lifetime issues.
> 
> It's the core_output_filter that's doing an apr_bucket_read() /
> apr_brigade_write() here.  Presumably to minimize the number of buffers
> that will have to be passed to writev() later.
> 
> That could be changed pretty easily, and the mmap/memcpy/munmap/writev
> would go away.  Note, however, that since you can only pass one fd to
> sendfile at a time anyway,

I suppose we can take advantage of sendfilev, which accepts multiple
file descriptors, on platforms where it's available.  However, I'd
rather not setaside file descriptors here anyway, because doing so
would leave us vulnerable to running out of file descriptors in the
multithreaded MPMs.

> delaying the sending of a FILE bucket is pretty
> pointless if you're going to send it out with sendfile later anyway.  What
> would be better is to mmap the file and hang onto the mmap to pass a bunch
> of mmap'ed regions to writev() all at once.  For cache purposes, that just
> means that all you have to do is consider the size of the files you're
> dealing with, and if they're small, use MMapFile instead of CacheFile.  If
> we then got rid of the apr_bucket_read/apr_brigade_write in the
> core_output_filter and just saved up a brigade instead, you'd be set.

I just have one consideration to add here: if we add code to do an mmap,
we need to make sure that it does an open+read instead of mmap if
"EnableMMAP off" has been set for the directory containing the file.

The more I think about it, though, the more I like the idea of just
writing the brigade out to the client immediately when we see EOS in
core_ouput_filter(), even if c->keepalive is true.  If we do this,
the only bad thing that will happen is that if a client opens a
keepalive connection and sends a stream of requests for 1-byte files,
each file will be sent back in a separate small packet.  But that's
still an improvement over the non-keepalive case (and equivalent to
the packetization that we get from 1.3).

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Cliff Woolley

On Mon, 24 Jun 2002, Cliff Woolley wrote:

> You can.  The buckets code is smart enough to (a) take no action if the
> apr_file_t is already in an ancestor pool of the one you're asking to
> setaside into and (b) just use apr_file_dup() to get it into the requested
> pool otherwise to handle the pool cleanup/lifetime issues.

One small amendment here:  If it's allowed to, in the (b) case above, it
will try to MMAP the file before it will try to apr_file_dup() it.  I
forget why we made that decision, but I imagine it still has merit.
Anyway, it's still the case that one those things will happen only if the
pool you've requested to setaside into is not an ancestor of the pool the
file is already opened into.  In other words, neither of these things will
happen to apr_file_t's that come out of mod_file_cache.

--Cliff




Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Cliff Woolley

On Mon, 24 Jun 2002, Bill Stoddard wrote:

> Yack... just noticed this too. This renders the fd cache (in
> mod_mem_cache) virtually useless.  Not sure why we cannot setaside a fd.

You can.  The buckets code is smart enough to (a) take no action if the
apr_file_t is already in an ancestor pool of the one you're asking to
setaside into and (b) just use apr_file_dup() to get it into the requested
pool otherwise to handle the pool cleanup/lifetime issues.

It's the core_output_filter that's doing an apr_bucket_read() /
apr_brigade_write() here.  Presumably to minimize the number of buffers
that will have to be passed to writev() later.

That could be changed pretty easily, and the mmap/memcpy/munmap/writev
would go away.  Note, however, that since you can only pass one fd to
sendfile at a time anyway, delaying the sending of a FILE bucket is pretty
pointless if you're going to send it out with sendfile later anyway.  What
would be better is to mmap the file and hang onto the mmap to pass a bunch
of mmap'ed regions to writev() all at once.  For cache purposes, that just
means that all you have to do is consider the size of the files you're
dealing with, and if they're small, use MMapFile instead of CacheFile.  If
we then got rid of the apr_bucket_read/apr_brigade_write in the
core_output_filter and just saved up a brigade instead, you'd be set.

--Cliff




Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Brian Pane

Bill Stoddard wrote:
.

>So changing the AP_MIN_BYTES_TO_WRITE just moves the relative postion of the write() 
>and
>the check pipeline read.
>

It has one other side-effect, though, and that's what's bothering me:
In the case where core_output_filter() decides to buffer a response because
it's smaller than 8KB, the end result is to turn:
sendfile
into:
mmap
memcpy
munmap
... buffer some more requests' output until we have 8KB ...
writev

...

>Some potential low hanging fruit here... would it make sense to make the keepalive 
>read
>(the one right before the select) an APR_INCOMPLETE_READ?
>

That makes sense.  The select ought to be cheaper than the read
that returns EAGAIN.

--Brian





Re: core_output_filter buffering for keepalives? Re: Apache 2.0Numbers

2002-06-23 Thread Cliff Woolley

On Sun, 23 Jun 2002, Brian Pane wrote:

> size is only 1KB, so core_output_filter reads in and buffers the
> contents of 8 requests before it finally reaches the 8KB threshold
> and starts calling writev.  1.3, in contrast, writes out each
> request's response immediately, with no buffering.

I think it likely that ab's design itself would contribute to Apache2
reading as slower in this case, given the (accurate) analysis you've
provided.  Even though ab's concurrency model is a *little* better now,
it is still the case as far as I know that ab will sit around and wait for
the response to come back before sending the next request out on that
connection.  At least this used to be the case, I'd have to look again at
the code to see if it still is.  Aaron could probably enlighten us here.

Anyway, what I'm saying is: don't make design decisions of this type based
only on the results of an ab run.

--Cliff