Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-10 Thread Cliff Woolley

On Wed, 10 Apr 2002, Greg Ames wrote:

> We have another similar looking dump, with this patch on.  It's
> /usr/local/apache2.0.35c/corefiles/httpd.core.2
>
> #0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
> #1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202

Ahh, this is different.

> Didn't I see a refcount inside the object that's being deleted?  If so,
> how can we trust it after the thing is deleted?

Nope, no refcount.  The bucket is refcounted, but not the mmap.  And the
is_owner should guarantee us that only _one_ bucket will ever try to
delete the thing and succeed.  The check for -1 should be saving us the
rest of the time... don't know what's going on.  Will investigate.

--Cliff

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-10 Thread Greg Ames

Cliff Woolley wrote:

> Greg, can you try this patch on daedalus for me? 

:-( 

We have another similar looking dump, with this patch on.  It's
/usr/local/apache2.0.35c/corefiles/httpd.core.2

#0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
#1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202
#2  0x280ad926 in mmap_destroy (data=0x8131088) at apr_buckets_mmap.c:82
#3  0x280adf08 in apr_brigade_cleanup (data=0x814c4f8) at apr_brigade.c:86
#4  0x280adebe in brigade_cleanup (data=0x814c4f8) at apr_brigade.c:72
#5  0x280cdd3b in run_cleanups (c=0x81393f8) at apr_pools.c:1713
#6  0x280cd51c in apr_pool_destroy (pool=0x813f010) at apr_pools.c:638
#7  0x805ea1c in ap_process_http_connection (c=0x812c120) at http_core.c:300
#8  0x806ecf3 in ap_run_process_connection (c=0x812c120) at connection.c:85
#9  0x806f008 in ap_process_connection (c=0x812c120, csd=0x812c048)
at connection.c:207
#10 0x80648f0 in child_main (child_num_arg=547) at prefork.c:671

Didn't I see a refcount inside the object that's being deleted?  If so, how can
we trust it after the thing is deleted?

Greg



Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-09 Thread Cliff Woolley

On Tue, 9 Apr 2002, Greg Ames wrote:

> But this morning we have a core dump
> (/usr/local/apache2.0.35c/corefiles/httpd.core.1).  The good news is
> that it doesn't look anything like the mmap cleanup dumps, so I think
> this patch is good.

Cool.  Yeah, I was pretty confident I had that one pinned down.  I'll go
ahead and do the corresponding Win32 patch.

--Cliff

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-09 Thread Greg Ames

Greg Ames wrote:
> 
> Cliff Woolley wrote:
> 
> > Greg, can you try this patch on daedalus for me? (sorry if there were line
> > wraps)
> 
> OK, it's running on port 8092, and passes everything I know how/remember to
> throw at it in test mode.  I'll put it into production after band practice
> tonight when I have time to watch it for a while.  Hopefully it will be another
> Maytag Man experience.

We did go live with 2.0.35 + this patch last night.  I posted a message to that
effect from an unusual mail server but don't see it this AM.  It may have been
eaten by a spam filter or is waiting for the moderator.  The biggest excitement
while I was watching it was to see that the 2.0.35 download rate is about 33%
higher than 1.3.24.

But this morning we have a core dump
(/usr/local/apache2.0.35c/corefiles/httpd.core.1).  The good news is that it
doesn't look anything like the mmap cleanup dumps, so I think this patch is
good.  I'll post the info from the new dump in a separate thread.

Greg



Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Greg Ames

Cliff Woolley wrote:

> Greg, can you try this patch on daedalus for me? (sorry if there were line
> wraps)

OK, it's running on port 8092, and passes everything I know how/remember to
throw at it in test mode.  I'll put it into production after band practice
tonight when I have time to watch it for a while.  Hopefully it will be another
Maytag Man experience.

Greg

p.s. I did have to hand apply the second half of the patch.  I couldn't see what
the mismatch was via eyeball.  It wasn't line wrap; it may have been differences
in whitespace; diff'ing the patches did highlight what looked like blank lines. 
No big deal.



Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Cliff Woolley

On Mon, 8 Apr 2002, Greg Ames wrote:

> hmmm, I wonder what changed since 2.0.32 to trigger the failure then?
> Maybe it's some of the internal_fast_redirect stuff, which affects this
> URL.

Could be.  It's most likely some combination of setaside being called and
then a brigade being cleaned up without all of the buckets being emptied
from it before-hand.  If the internal_fast_redirect stuff could cause that
(I suppose it possibly could), then

> > Greg, can you try this patch on daedalus for me?
>
> Sure, no problem.  But it will probably be a while before it goes live.
> Last time I checked we had 415 active requests, and we've hit 600
> (ServerLimit) today.  Mondays typically are heavy load days, plus I
> think we're getting /.ed .

'kay.  In the meanwhile I'll try to pin down an exact request that will
cause it to segfault.

--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA





Re: [PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Greg Ames

Cliff Woolley wrote:

> As a side note, the buckets code is okay because (and it even has a
> comment to this effect), it assumes that apr_mmap_delete() will do the
> Right Thing, and if the mmap is not owned or already deleted, it will just
> be a no-op.

Sorry, I didn't notice before you pointed it out that the code involved left
bucket-land. 

hmmm, I wonder what changed since 2.0.32 to trigger the failure then?  Maybe
it's some of the internal_fast_redirect stuff, which affects this URL. 

> Greg, can you try this patch on daedalus for me? 

Sure, no problem.  But it will probably be a while before it goes live.  Last
time I checked we had 415 active requests, and we've hit 600 (ServerLimit)
today.  Mondays typically are heavy load days, plus I think we're getting /.ed
.  

Greg



[PATCH] Re: the most common seg fault on daedalus

2002-04-08 Thread Cliff Woolley

On Mon, 8 Apr 2002, Cliff Woolley wrote:

> Ha. ;-)  Actually I think I see a number of potential problems in
> apr/mmap/unix/mmap.c.  I'm working on a patch and I'll post it here.

Okay, there are two or three issues or potential issues addressed here.

(1) apr_mmap_dup() would register a cleanup on the *new_mmap even if the
new one was not the owner.  That turned out to be okay though because
mmap_cleanup would detect the !mm->is_owner case and just pretend
everything was cool.  It's just extra work that doesn't need to be done.

(2) In mmap_cleanup(), if munmap() fails for some reason, we would not set
mm->mm to -1, meaning if we ever tried to run the cleanup a second time
for some reason, we would not know we'd already tried to munmap() the
thing and we'd do it again.  This is the same kind of bug that killed us a
while back on directory cleanups if you'll recall.

(3) Finally (and this is, I believe, the source of our current segfaults):
okay, here's a weird sequence of events:

   - create an mmap
   - dup it (with or without transferring ownership)
   - cleanup or delete one copy
   - delete the other one

Because apr_mmap_delete() would assume that any APR_SUCCESS from
mmap_cleanup() meant "OK, go ahead and kill the cleanup", and
!mm->is_owner would return APR_SUCCESS, we would kill the cleanup even if
there wasn't one to kill.  Boom.  There are two or three related sequences
of events that cause this, but they all boil down to essentially the
above.

There are parallel problems in the win32 mmap code that will need patching
as well.

As a side note, the buckets code is okay because (and it even has a
comment to this effect), it assumes that apr_mmap_delete() will do the
Right Thing, and if the mmap is not owned or already deleted, it will just
be a no-op.

Greg, can you try this patch on daedalus for me? (sorry if there were line
wraps)

Thanks,
Cliff



Index: mmap.c
===
RCS file: /home/cvs/apr/mmap/unix/mmap.c,v
retrieving revision 1.40
diff -u -d -r1.40 mmap.c
--- mmap.c  13 Mar 2002 20:39:24 -  1.40
+++ mmap.c  8 Apr 2002 18:26:36 -
@@ -85,25 +85,21 @@
 apr_mmap_t *mm = themmap;
 int rv;

-if (!mm->is_owner) {
-return APR_SUCCESS;
+if ((!mm->is_owner) || (mm->mm == (void *)-1)) {
+/* XXX: we shouldn't ever get here */
+return APR_ENOENT;
 }

 #ifdef BEOS
 rv = delete_area(mm->area);
-
-if (rv == 0) {
-mm->mm = (void *)-1;
-return APR_SUCCESS;
-}
 #else
 rv = munmap(mm->mm, mm->size);
+#endif
+mm->mm = (void *)-1;

 if (rv == 0) {
-mm->mm = (void *)-1;
 return APR_SUCCESS;
 }
-#endif
 return errno;
 }

@@ -189,24 +185,21 @@
 (*new_mmap)->is_owner = 1;
 old_mmap->is_owner = 0;
 apr_pool_cleanup_kill(old_mmap->cntxt, old_mmap, mmap_cleanup);
+apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup,
+  apr_pool_cleanup_null);
 }
 else {
 (*new_mmap)->is_owner = 0;
 }
-apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup,
-  apr_pool_cleanup_null);
 }
 return APR_SUCCESS;
 }

 APR_DECLARE(apr_status_t) apr_mmap_delete(apr_mmap_t *mm)
 {
-apr_status_t rv;
+apr_status_t rv = APR_SUCCESS;

-if (mm->mm == (void *)-1)
-return APR_ENOENT;
-
-if ((rv = mmap_cleanup(mm)) == APR_SUCCESS) {
+if (mm->is_owner && ((rv = mmap_cleanup(mm)) == APR_SUCCESS)) {
 apr_pool_cleanup_kill(mm->cntxt, mm, mmap_cleanup);
 return APR_SUCCESS;
 }


--
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA