Re: the most common seg fault on daedalus

2002-04-08 Thread Cliff Woolley

On Mon, 8 Apr 2002, Greg Ames wrote:

 ...looks like a problem with cleaning up an mmap bucket.  This is from
 /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same
 problem.

 #0  apr_pool_cleanup_kill (p=0x8152f08, data=0x8152eb8,
 cleanup_fn=0x280cc700 mmap_cleanup) at apr_pools.c:1669
 #1  0x280cc90a in apr_mmap_delete (mm=0x8152eb8) at mmap.c:210
 #2  0x280ad926 in mmap_destroy (data=0x8131298) at apr_buckets_mmap.c:82
 #3  0x280adf08 in apr_brigade_cleanup (data=0x8134ca8) at apr_brigade.c:86
 #4  0x280adebe in brigade_cleanup (data=0x8134ca8) at apr_brigade.c:72
 #5  0x280cdd3b in run_cleanups (c=0x813cb98) at apr_pools.c:1713
 #6  0x280cd51c in apr_pool_destroy (pool=0x814b010) at apr_pools.c:638
 #7  0x280cd417 in apr_pool_clear (pool=0x812c010) at apr_pools.c:600
 #8  0x8064752 in child_main (child_num_arg=291) at prefork.c:586

 the mm looks whacked/previously deleted:

Hm aha, I bet I know what's going on.  If the mmap bucket is in a
brigade that's registered in pool p and the mmap that bucket points to is
in p or a subpool of p, and the brigade is not cleaned out *before* the
pool is cleaned up, then we'll end up deleting the mmap twice.  It's
a bit of a wacky ordering of events that has to happen to trigger
this condition, but in hindsight it makes perfect sense.  We need some way
that mmap_destroy() can detect that its mmap has already been deleted and
skip the delete.  Is there a flag in the apr_mmap_t that says I'm
deleted?

I'll look into this today.

 ...but the bucket structures looks fine:

They would.

--Cliff

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: the most common seg fault on daedalus

2002-04-08 Thread Greg Ames

Greg Ames wrote:
 
 ...looks like a problem with cleaning up an mmap bucket.  This is from
 /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same problem.
 
 #0  apr_pool_cleanup_kill (p=0x8152f08, data=0x8152eb8,
 cleanup_fn=0x280cc700 mmap_cleanup) at apr_pools.c:1669
 #1  0x280cc90a in apr_mmap_delete (mm=0x8152eb8) at mmap.c:210
 #2  0x280ad926 in mmap_destroy (data=0x8131298) at apr_buckets_mmap.c:82

We now have dump 6 which looks the same.

Since there's no connection when the seg fault hits, I resorted to vi'ing the
dump to find the input buffers.  There definately is a common denominator:

dumps 3,4, and 6:
GET /dist/httpd/ HTTP/1.0^M
Connection: close^M
Accept: */*^M
Host: www.apache.org^M
Referer: http://www.apache.org/dist/httpd/^M
User-Agent: Mozilla/4.7 [en] (Win98; I)^M
Range: bytes=0-^M
^M
 
dump 5:
GET /dist/httpd/?C=SO=D HTTP/1.0^M
User-Agent: Irvine/0.3.12^M
Connection: close^M
Weferer: SWZIDREXCAXZOWCONEUQZAAFXISHJEXXIMQZUIVOT^M
Host: www.apache.org^M
Accept: */*^M
Range: bytes=0-^M
^M

Yes, I double checked dump 5 to verify that it contains the Elmer Fudd version
of Referer: :-)

The Range: header may be key.  Jeff tried several similar requests against
daedalus and got a 416 HTTP response code (Requested Range Not Satisfiable) each
time but no dumps.  He got a 200 against 1.3.  The Range: header looks kosher
according to RFC 2616.  I have no idea how common such a Range: header is.

Keep in mind that this is the same URL that showed that the 2.0.34 output filter
chain was busted.
We have Multiviews processing for HEADER and README happening on top of the
mod_autoindex stuff.

Greg



Re: the most common seg fault on daedalus

2002-04-08 Thread Cliff Woolley

On Mon, 8 Apr 2002, Cliff Woolley wrote:

  ...looks like a problem with cleaning up an mmap bucket.  This is from
  /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same
  problem.

In this function:

APR_DECLARE(apr_status_t) apr_mmap_dup(apr_mmap_t **new_mmap,
   apr_mmap_t *old_mmap,
   apr_pool_t *p,
   int transfer_ownership)
{
*new_mmap = (apr_mmap_t *)apr_pmemdup(p, old_mmap, sizeof(apr_mmap_t));
(*new_mmap)-cntxt = p;

/* The old_mmap can transfer ownership only if the old_mmap itself
 * is an owner of the mmap'ed segment.
 */
if (old_mmap-is_owner) {
if (transfer_ownership) {
(*new_mmap)-is_owner = 1;
old_mmap-is_owner = 0;
apr_pool_cleanup_kill(old_mmap-cntxt, old_mmap, mmap_cleanup);
}
else {
(*new_mmap)-is_owner = 0;
}
apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup,
  apr_pool_cleanup_null);
}
return APR_SUCCESS;
}


Why is apr_pool_cleanup_register() called regardless of whether we're
transferring ownership or not?  Shouldn't it only be called if
(transfer_ownership), same as the apr_pool_cleanup_kill() call?

--Cliff

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: the most common seg fault on daedalus

2002-04-08 Thread Cliff Woolley

On Mon, 8 Apr 2002, Greg Ames wrote:

 It sounds reasonable, but I'm easily convinced...you're the bucketmeister.

Ha. ;-)  Actually I think I see a number of potential problems in
apr/mmap/unix/mmap.c.  I'm working on a patch and I'll post it here.

--Cliff

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Greg Ames

Cliff Woolley wrote:

 As a side note, the buckets code is okay because (and it even has a
 comment to this effect), it assumes that apr_mmap_delete() will do the
 Right Thing, and if the mmap is not owned or already deleted, it will just
 be a no-op.

Sorry, I didn't notice before you pointed it out that the code involved left
bucket-land. 

hmmm, I wonder what changed since 2.0.32 to trigger the failure then?  Maybe
it's some of the internal_fast_redirect stuff, which affects this URL. 

 Greg, can you try this patch on daedalus for me? 

Sure, no problem.  But it will probably be a while before it goes live.  Last
time I checked we had 415 active requests, and we've hit 600 (ServerLimit)
today.  Mondays typically are heavy load days, plus I think we're getting /.ed
.  

Greg



Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Greg Ames

Cliff Woolley wrote:

 Greg, can you try this patch on daedalus for me? (sorry if there were line
 wraps)

OK, it's running on port 8092, and passes everything I know how/remember to
throw at it in test mode.  I'll put it into production after band practice
tonight when I have time to watch it for a while.  Hopefully it will be another
Maytag Man experience.

Greg

p.s. I did have to hand apply the second half of the patch.  I couldn't see what
the mismatch was via eyeball.  It wasn't line wrap; it may have been differences
in whitespace; diff'ing the patches did highlight what looked like blank lines. 
No big deal.