Re: the most common seg fault on daedalus
On Mon, 8 Apr 2002, Greg Ames wrote: ...looks like a problem with cleaning up an mmap bucket. This is from /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same problem. #0 apr_pool_cleanup_kill (p=0x8152f08, data=0x8152eb8, cleanup_fn=0x280cc700 mmap_cleanup) at apr_pools.c:1669 #1 0x280cc90a in apr_mmap_delete (mm=0x8152eb8) at mmap.c:210 #2 0x280ad926 in mmap_destroy (data=0x8131298) at apr_buckets_mmap.c:82 #3 0x280adf08 in apr_brigade_cleanup (data=0x8134ca8) at apr_brigade.c:86 #4 0x280adebe in brigade_cleanup (data=0x8134ca8) at apr_brigade.c:72 #5 0x280cdd3b in run_cleanups (c=0x813cb98) at apr_pools.c:1713 #6 0x280cd51c in apr_pool_destroy (pool=0x814b010) at apr_pools.c:638 #7 0x280cd417 in apr_pool_clear (pool=0x812c010) at apr_pools.c:600 #8 0x8064752 in child_main (child_num_arg=291) at prefork.c:586 the mm looks whacked/previously deleted: Hm aha, I bet I know what's going on. If the mmap bucket is in a brigade that's registered in pool p and the mmap that bucket points to is in p or a subpool of p, and the brigade is not cleaned out *before* the pool is cleaned up, then we'll end up deleting the mmap twice. It's a bit of a wacky ordering of events that has to happen to trigger this condition, but in hindsight it makes perfect sense. We need some way that mmap_destroy() can detect that its mmap has already been deleted and skip the delete. Is there a flag in the apr_mmap_t that says I'm deleted? I'll look into this today. ...but the bucket structures looks fine: They would. --Cliff -- Cliff Woolley [EMAIL PROTECTED] Charlottesville, VA
Re: the most common seg fault on daedalus
Greg Ames wrote: ...looks like a problem with cleaning up an mmap bucket. This is from /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same problem. #0 apr_pool_cleanup_kill (p=0x8152f08, data=0x8152eb8, cleanup_fn=0x280cc700 mmap_cleanup) at apr_pools.c:1669 #1 0x280cc90a in apr_mmap_delete (mm=0x8152eb8) at mmap.c:210 #2 0x280ad926 in mmap_destroy (data=0x8131298) at apr_buckets_mmap.c:82 We now have dump 6 which looks the same. Since there's no connection when the seg fault hits, I resorted to vi'ing the dump to find the input buffers. There definately is a common denominator: dumps 3,4, and 6: GET /dist/httpd/ HTTP/1.0^M Connection: close^M Accept: */*^M Host: www.apache.org^M Referer: http://www.apache.org/dist/httpd/^M User-Agent: Mozilla/4.7 [en] (Win98; I)^M Range: bytes=0-^M ^M dump 5: GET /dist/httpd/?C=SO=D HTTP/1.0^M User-Agent: Irvine/0.3.12^M Connection: close^M Weferer: SWZIDREXCAXZOWCONEUQZAAFXISHJEXXIMQZUIVOT^M Host: www.apache.org^M Accept: */*^M Range: bytes=0-^M ^M Yes, I double checked dump 5 to verify that it contains the Elmer Fudd version of Referer: :-) The Range: header may be key. Jeff tried several similar requests against daedalus and got a 416 HTTP response code (Requested Range Not Satisfiable) each time but no dumps. He got a 200 against 1.3. The Range: header looks kosher according to RFC 2616. I have no idea how common such a Range: header is. Keep in mind that this is the same URL that showed that the 2.0.34 output filter chain was busted. We have Multiviews processing for HEADER and README happening on top of the mod_autoindex stuff. Greg
Re: the most common seg fault on daedalus
On Mon, 8 Apr 2002, Cliff Woolley wrote: ...looks like a problem with cleaning up an mmap bucket. This is from /usr/local/apache2.0.35/corefiles/httpd.core.3 ; .4 and .5 are the same problem. In this function: APR_DECLARE(apr_status_t) apr_mmap_dup(apr_mmap_t **new_mmap, apr_mmap_t *old_mmap, apr_pool_t *p, int transfer_ownership) { *new_mmap = (apr_mmap_t *)apr_pmemdup(p, old_mmap, sizeof(apr_mmap_t)); (*new_mmap)-cntxt = p; /* The old_mmap can transfer ownership only if the old_mmap itself * is an owner of the mmap'ed segment. */ if (old_mmap-is_owner) { if (transfer_ownership) { (*new_mmap)-is_owner = 1; old_mmap-is_owner = 0; apr_pool_cleanup_kill(old_mmap-cntxt, old_mmap, mmap_cleanup); } else { (*new_mmap)-is_owner = 0; } apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup, apr_pool_cleanup_null); } return APR_SUCCESS; } Why is apr_pool_cleanup_register() called regardless of whether we're transferring ownership or not? Shouldn't it only be called if (transfer_ownership), same as the apr_pool_cleanup_kill() call? --Cliff -- Cliff Woolley [EMAIL PROTECTED] Charlottesville, VA
Re: the most common seg fault on daedalus
On Mon, 8 Apr 2002, Greg Ames wrote: It sounds reasonable, but I'm easily convinced...you're the bucketmeister. Ha. ;-) Actually I think I see a number of potential problems in apr/mmap/unix/mmap.c. I'm working on a patch and I'll post it here. --Cliff -- Cliff Woolley [EMAIL PROTECTED] Charlottesville, VA
Re: [PATCH] Re: the most common seg fault on daedalus
Cliff Woolley wrote: As a side note, the buckets code is okay because (and it even has a comment to this effect), it assumes that apr_mmap_delete() will do the Right Thing, and if the mmap is not owned or already deleted, it will just be a no-op. Sorry, I didn't notice before you pointed it out that the code involved left bucket-land. hmmm, I wonder what changed since 2.0.32 to trigger the failure then? Maybe it's some of the internal_fast_redirect stuff, which affects this URL. Greg, can you try this patch on daedalus for me? Sure, no problem. But it will probably be a while before it goes live. Last time I checked we had 415 active requests, and we've hit 600 (ServerLimit) today. Mondays typically are heavy load days, plus I think we're getting /.ed . Greg
Re: [PATCH] Re: the most common seg fault on daedalus
Cliff Woolley wrote: Greg, can you try this patch on daedalus for me? (sorry if there were line wraps) OK, it's running on port 8092, and passes everything I know how/remember to throw at it in test mode. I'll put it into production after band practice tonight when I have time to watch it for a while. Hopefully it will be another Maytag Man experience. Greg p.s. I did have to hand apply the second half of the patch. I couldn't see what the mismatch was via eyeball. It wasn't line wrap; it may have been differences in whitespace; diff'ing the patches did highlight what looked like blank lines. No big deal.