Hi all,

I got 3 core dumps from the server, and tried to find some common info.

Getting more info from stack trace, all the crashes happen on a "ns_httpget"
call ! Strange...

For info, I have 2 machines, one with aolserver doing pure community web
site, one with apache/php/courier-imap/qmail/ldap and a derived version off
Squirrelmail to manage a webmail.

Both machines were on solaris (no troubles) and are now on Fedora Core 4
(troubles..).

The webmail machine (apache...) has connection troubles when very  busy and
the result is the following on big traffic :
* phph tries to connect to courier-imap, which call authdeamon module which
connect to ldap server. The ldap server (for a unknown reason) return
Input/Output error in the server logs, and the authdeamon and courier-imap
are in a strange state that make php entering in an infinite loop crashing
apache !!!!

Ok... come back to aolserver/tcl : All that bla bla to say that on fedora
core 4, maybe the network connection may result in strange results from
sockets sometimes and the result is that the client receive info on the read
connection that is not right and is reading info forever...

I checked all the ns_http* procs and found 2 or 3 possible infinite loops
( while(1) etc...) and also the "_ns_http_gets" does not have a "return"
call at the end...

What I can do is add a counter on those loops and log something on the log
file after every 1000 iteration on the loop. So when the next crash happens,
reading the logs I can find if tcl was looping or not.

What do you think ?  Am I completely wrong or can ns_httpget generate
troubles like that ?

Regards / Cordialement

====================
Jean-Fabrice Rabaute
CORE SERVICES :: Software/Web development & Consulting services

http://www.core-services.fr - {Enjoy the future today}
http://www.debugbar.com : The most advanced WEB development tool for
Internet Explorer


> -----Message d'origine-----
> De : AOLserver Discussion [mailto:[EMAIL PROTECTED]
> la part de Jean-Fabrice RABAUTE
> Envoyé : samedi 15 octobre 2005 12:11
> A : AOLSERVER@LISTSERV.AOL.COM
> Objet : Re: [AOLSERVER] [AS4.0.10/tcl 8.4.11] Fatal error signal 11
>
>
> Hi,
>
> I changed my stack size from 500000 to 2048000 and ... after five days:
> Crash ! With exactly the same call trace, bu from the page "view-msg.tcl"
> instead on forum-x.tcl. 5I thing the page called is not the problem, as I
> never crashed the server on solaris).
>
> Maybe I am wrong, but I don't think the stack size is the cause of the
> problem.
>
> I will try to make scripts to reproduce the bug.
>
> Regards / Cordialement
>
> ====================
> Jean-Fabrice Rabaute
> CORE SERVICES :: Software/Web development & Consulting services
>
> http://www.core-services.fr - {Enjoy the future today}
> http://www.debugbar.com : The most advanced WEB development tool for
> Internet Explorer
>
>
> > -----Message d'origine-----
> > De : AOLserver Discussion [mailto:[EMAIL PROTECTED]
> > la part de Dossy Shiobara
> > Envoyé : mardi 11 octobre 2005 23:59
> > A : AOLSERVER@LISTSERV.AOL.COM
> > Objet : Re: [AOLSERVER] [AS4.0.10/tcl 8.4.11] Fatal error signal 11
> >
> >
> > On 2005.10.11, Jean-Fabrice RABAUTE <[EMAIL PROTECTED]> wrote:
> > > I come back... with my debug info !
> > >
> > > It looks like aolserver crashes on a adp call.
> > > Here is the complete trace (long !) with some infos after it :
> >
> > This is great:
> >
> > > #0  0x00438402 in __kernel_vsyscall ()
> > > #1  0x002f81f8 in raise () from /lib/libc.so.6
> > > #2  0x002f9948 in abort () from /lib/libc.so.6
> > > #3  0x0052b153 in FatalSignalHandler (signal=11) at unix.c:78
> > > #4  <signal handler called>
> > ...
> > > #71 0x0050d427 in Ns_ConnRunRequest (conn=0x81a9558) at op.c:233
> > > #72 0x0050f329 in ConnRun (connPtr=0x81a9558) at queue.c:759
> > > #73 0x0050ee96 in NsConnThread (arg=0xbdcfed0) at queue.c:617
> > > #74 0x00449e57 in NsThreadMain (arg=0xb6ecd030) at thread.c:224
> > > #75 0x0044b7d0 in ThreadMain (arg=0xb6ecd030) at pthread.c:730
> > > #76 0x00116b80 in start_thread () from /lib/libpthread.so.0
> > > #77 0x0039adee in clone () from /lib/libc.so.6
> >
> > 77 calls deep?
> >
> > What is your stacksize configured to?
> >
> > -- Dossy
> >
> > --
> > Dossy Shiobara              [EMAIL PROTECTED] | http://dossy.org/
> > Panoptic Computer Network   http://panoptic.com/
> >   "He realized the fastest way to change is to laugh at your own
> >     folly -- then you can let go and quickly move on." (p. 70)
> >
> >
> > --
> > AOLserver - http://www.aolserver.com/
> >
> > To Remove yourself from this list, simply send an email to
> > <[EMAIL PROTECTED]> with the
> > body of "SIGNOFF AOLSERVER" in the email message. You can leave
> > the Subject: field of your email blank.
> >
> > --
> > No virus found in this incoming message.
> > Checked by AVG Anti-Virus.
> > Version: 7.0.344 / Virus Database: 267.11.10/120 - Release Date:
> > 05/10/2005
> >
> --
> No virus found in this outgoing message.
> Checked by AVG Anti-Virus.
> Version: 7.0.344 / Virus Database: 267.11.14/130 - Release Date:
> 12/10/2005
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to
> <[EMAIL PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave
> the Subject: field of your email blank.
>
> --
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.344 / Virus Database: 267.11.14/130 - Release Date:
> 12/10/2005
>
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.12.4/143 - Release Date: 19/10/2005


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to