[BUG] Memory Courruption (was: RE: [Q] SIGSEGV After fork())

2002-02-15 Thread Fister, Mark


 Dear mod_perl experts:
 
 Collectively, we've been at this for more than two weeks and have searched
 various mod_perl archives, all to no avail.
 
 Symptom:
 ===
 SIGSEGV after fork().  Very reproducible.  Memory corruption gets moved
 around if the codebase changes.

[ SNIP ]

The above is the key: moved around.  Therefore, I need Purify or similar
tool.  I'm going to have to go this route, since nobody has any ideas.
Go go gadget purchasing! :(

The only other way I can think of to solve this is to send my module list
to this audience.  Please find it, attached, with home-grown modules
deleted.

More info:

In speaking with Ged (who is very knowledgeable, thanks!), I was led down
a path that caused my server to start (setting PERL_DESTRUCT_LEVEL to 0),
but it doesn't solve the memory corruption that perl_destruct ends up
stumbling on, only hides it. For some reason, in my case, the address of
the PV_sv_undef symbol ends up being the target of my Perl_safesysfree,
below (the xpv_pv address was, for some reason, 0x4046cc18, and that is
the address of the PV_sv_undef symbol).

 Stack Trace:
 ===
 #0  __pthread_mutex_lock (mutex=0x8bf04999) at mutex.c:99
 #1  0x401b9cc8 in __libc_free (mem=0x4046cc18) at malloc.c:3152
 #2  0x403ce028 in Perl_safesysfree (where=0x4046cc18) at util.c:158
 #3  0x403f20d8 in Perl_sv_clear (sv=0x8198f60) at sv.c:3827
 #4  0x403f2473 in Perl_sv_free (sv=0x8198f60) at sv.c:3950
 #5  0x403f80e1 in do_clean_all (sv=0x8198f60) at sv.c:8411
 #6  0x403e9c5e in S_visit (f=0x403f8094 do_clean_all) at sv.c:162
 #7  0x403e9ce2 in Perl_sv_clean_all () at sv.c:193
 #8  0x4038594a in perl_destruct (my_perl=0x809a9a8) at perl.c:665
 #9  0x4035629c in perl_shutdown (s=0x0, p=0x0) at mod_perl.c:294
 #10 0x40356be6 in mp_dso_unload (data=0x808e714) at mod_perl.c:489
 #11 0x08050f34 in run_cleanups (c=0x809c8ac) at alloc.c:1713
 #12 0x0804f5fa in ap_clear_pool (a=0x808e714) at alloc.c:538
 #13 0x08062128 in standalone_main (argc=7, argv=0xb294) at
http_main.c:5014 
 #14 0x08062cb2 in main (argc=7, argv=0xb294) at http_main.c:5401
 #15 0x40155627 in __libc_start_main (main=0x80627d4 main, argc=7, 
 ubp_av=0xb294, init=0x804e3e4 _init, fini=0x807aa40 _fini, 
 rtld_fini=0x4000dcd4 _dl_fini, stack_end=0xb28c)
 at ../sysdeps/generic/libc-start.c:129
 
 ***NOTE***  the following gdb session was gleaned from sv.c and refers to
 the freed memory location (0x4046cc18) above:
 
 (gdb) p *((XPV*)(sv)-sv_any)
 $13 = {xpv_pv = 0x4046cc18 , xpv_cur = 135562488, xpv_len = 135617180} 

[ SNIP ]

-- 
\_/} Mark P. Fister Java, Java, everywhere, and all\_/}
\_/} eBay, Inc. the cups did shrink; Java, Java\_/}
\_/} Austin, TX everywhere, nor any drop to drink! \_/}




module_list_ulist.txt
Description: Binary data


RE: [BUG] Memory Courruption (was: RE: [Q] SIGSEGV After fork())

2002-02-15 Thread Fister, Mark

  The only other way I can think of to solve this is to send my module
list
  to this audience.  Please find it, attached, with home-grown modules
  deleted.
 
 Have you tried debugging the old-fashioned way, i.e. remove things until
it
 works?  That's your best bet.  I suspect you will find that you have some
 module doing something with XS or sockets or filehandles that can't deal
 with being forked.

That's just the thing with memory corruption:

Adding or removing random code causes the SIGSEGV signature to change
(or causes the server to suddenly start working).  I am nearly
certain that the memory corruption is happening BEFORE the fork,
anyway, which is why I modified the subject line of this thread.
Thank you VERY, VERY much for your ideas, though - I will keep
looking while I wait for the powers that be to get my bounds checking
software!

 - Perrin

-- 
\_/} Mark P. Fister Java, Java, everywhere, and all\_/}
\_/} eBay, Inc. the cups did shrink; Java, Java\_/}
\_/} Austin, TX everywhere, nor any drop to drink! \_/}



RE: [BUG] Memory Courruption (was: RE: [Q] SIGSEGV After fork())

2002-02-15 Thread Fister, Mark

On Fri, Feb 15, 2002 at 12:17:07PM -0800, Paul Lindner wrote:
 On Fri, Feb 15, 2002 at 11:44:03AM -0600, Fister, Mark wrote:
  
   Dear mod_perl experts:
   
   Collectively, we've been at this for more than two weeks and have
searched
   various mod_perl archives, all to no avail.
   
   Symptom:
   ===
   SIGSEGV after fork().  Very reproducible.  Memory corruption gets
moved
   around if the codebase changes.
  
  [ SNIP ]
  
  The above is the key: moved around.  Therefore, I need Purify or similar
  tool.  I'm going to have to go this route, since nobody has any ideas.
  Go go gadget purchasing! :(
  
  The only other way I can think of to solve this is to send my module
list
  to this audience.  Please find it, attached, with home-grown modules
  deleted.
 
 To further diagnose this problem you might consider using the sigtrap
 module and paying careful attention to your logs...  This at least led
 me to the portion of my perl that was causing the problem.
 
 A simple
 
   use sigtrap;
 
 The default signal handler used in this module gives you a stack trace
 before the core dump..

Unless use sigtrap; itself causes a SIGSEGV, which invokes the
signal handler, which causes a SIGSEGV, which invokes... and all of
a sudden your httpd process goes to many thousands of stack levels
deep and consumes 1GB of memory... ;)

-- 
\_/} Mark P. Fister Java, Java, everywhere, and all\_/}
\_/} eBay, Inc. the cups did shrink; Java, Java\_/}
\_/} Austin, TX everywhere, nor any drop to drink! \_/}



RE: [Q] SIGSEGV After fork()

2002-02-07 Thread Fister, Mark

On Thu, Feb 07, 2002 at 01:03:29AM +, Ged Haywood wrote:
 Hi there,

Hi!  Thank you SOOO much for the reply!

[SNIP]

 You might try usemymalloc.

Tried that.  Note: you also tried to help a fellow back in November of
2001 on this VERY same stack trace.

http://groups.yahoo.com/group/modperl/message/39560

Compiler:
  optimize='-g',
 
 H...

See below.

  ccversion='', gccversion='2.96 2731 (Red Hat Linux 7.1
2.96-85)', gccosandvers=''
 
 You've obviously read the docs, so I take it the same compiler built
 Aapche, mod_perl and Perl.  Have you tried this on RH6.2 with the
 compiler that came with that?

Yes.  Note also: the problem didn't use to happen with perl 5.00404,
mod_perl 1.08 and apache 1.3b5 (with exactly the same codebase).

See below.

  Stack Trace:
  ===
  #0  __pthread_mutex_lock (mutex=0x8bf04999) at mutex.c:99
  #1  0x401b9cc8 in __libc_free (mem=0x4046cc18) at malloc.c:3152
  #2  0x403ce028 in Perl_safesysfree (where=0x4046cc18) at util.c:158
  #3  0x403f20d8 in Perl_sv_clear (sv=0x8198f60) at sv.c:3827
  #4  0x403f2473 in Perl_sv_free (sv=0x8198f60) at sv.c:3950
  #5  0x403f80e1 in do_clean_all (sv=0x8198f60) at sv.c:8411
  #6  0x403e9c5e in S_visit (f=0x403f8094 do_clean_all) at sv.c:162
  #7  0x403e9ce2 in Perl_sv_clean_all () at sv.c:193
  #8  0x4038594a in perl_destruct (my_perl=0x809a9a8) at perl.c:665
  #9  0x4035629c in perl_shutdown (s=0x0, p=0x0) at mod_perl.c:294
  #10 0x40356be6 in mp_dso_unload (data=0x808e714) at mod_perl.c:489
 
 Have you tried a statically linked mod_perl?

Yes.  See below.

 73,
 Ged.

Here's what HAS been tried:

- -O2 vs. -O3 vs. -g
- Perl's malloc vs. system malloc
- static vs. dynamic loading of httpd modules
- Different Berkeley db in case there were discrepancies with that
- Different compilers, RedHat releases, glibc releases
- --enable-rule=EXPAT vs. --disable-rule=EXPAT

All failed.

-- 
\_/} Mark P. Fister Java, Java, everywhere, and all\_/}
\_/} eBay, Inc. the cups did shrink; Java, Java\_/}
\_/} Austin, TX everywhere, nor any drop to drink! \_/}



RE: [Q] SIGSEGV After fork()

2002-02-07 Thread Fister, Mark

On Thu, Feb 07, 2002 at 09:35:18PM +, Ged Haywood wrote:
 Hi there,
 
 On Thu, 7 Feb 2002, Fister, Mark wrote:
 
  Tried that.  Note: you also tried to help a fellow back in November of
  2001 on this VERY same stack trace.
  
  http://groups.yahoo.com/group/modperl/message/39560
 
 Heh, didn't get very far with Lynx on that URL...
 does anybody know what happened to that one?
 
   You've obviously read the docs, so I take it the same compiler built
   Aapche, mod_perl and Perl.  Have you tried this on RH6.2 with the
   compiler that came with that?
  
  Yes.  Note also: the problem didn't use to happen with perl 5.00404,
  mod_perl 1.08 and apache 1.3b5 (with exactly the same codebase).
 
 5.00404 ?? 1.08 !?!...
 
 Ah.  Now we're getting somewhere.  Maybe.  Why not try Perl 5.7.2?
 I'm using it in development, did some pretty heavy stuff with 5.7.0
 and it was fine, then I ran into SIGSEVs and things trying to do some
 simple profiling with Devel::DProf on some simple code (heavy data:)
 which went away when I installed 5.7.2.  (BTW thanks Stas!:)

Tried 5.7.2.  I still have core dumps.  This is why I decided to try
the mod_perl list instead of p5p.  I'm definitely nearly in tears. :(

NOTE: some of our Apache-based servers have no problem with the same
Apache/Perl/mod_perl that we're running.  However, others do... and
trying to do module list comparisons between the ones that
do and don't doesn't come up with anything definitive.

 73,
 Ged.

-- 
\_/} Mark P. Fister Java, Java, everywhere, and all\_/}
\_/} eBay, Inc. the cups did shrink; Java, Java\_/}
\_/} Austin, TX everywhere, nor any drop to drink! \_/}