Re: [AOLSERVER] AOLserver Crash

2009-07-02 Thread Gustaf Neumann

William Scott Jordan schrieb:

Hey all!

I've had a few times recently where AOLserver has crashed under high 
loads.  Each time, I see a line in the logs that looks something like 
"unable to alloc 4895393 bytes".

What was the size of AOLserver before the crash (most likely around 2GB)?
What is the footprint per connection thread (how many packages are you 
loading, are you using OpenACS, DotLrn)?

How many connection threads did you define?

Any guesses on what's causing this?

In most cases, reduction on the numer of max connection threads helps.
One can certainly create even with a single thread this problem (e.g.
inifinite loop appending to a tcl variable).
Once, you are close to 2GB, an exec of an external program can cause
crossing this boundary.

In general, there the following ways to proceed:
- find the memory wasting loop like above (rather unlikely problem)
- reduce memory consumption
- compile tcl, aolserver and all used modules with 64bit

-gustaf neumann


-William


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the 
Subject: field of your email blank.



--
Univ.Prof. Dr. Gustaf Neumann
Institute of Information Systems and New Media
WU Vienna
Augasse 2-6, A-1090 Vienna, AUSTRIA


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver Crash

2009-07-01 Thread Dossy Shiobara

On 7/1/09 12:54 PM, William Scott Jordan wrote:


I've had a few times recently where AOLserver has crashed under high
loads.  Each time, I see a line in the logs that looks something like
"unable to alloc 4895393 bytes".

Any guesses on what's causing this?


Got Tcl code that invokes [exec] anywhere?

--
Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


[AOLSERVER] AOLserver Crash

2009-07-01 Thread William Scott Jordan

Hey all!

I've had a few times recently where AOLserver has crashed under high 
loads.  Each time, I see a line in the logs that looks something like 
"unable to alloc 4895393 bytes".


Any guesses on what's causing this?

-William


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
 with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver Crash!

2008-11-02 Thread Dossy Shiobara
Rami Jadaa wrote:
> So any idea what could be there in nsmysql that would crash AOLserver?

Are you using the latest nsmysql (CVS HEAD)?

Can you get a core dump from the crash and a gdb backtrace?

-- 
Dossy Shiobara  | [EMAIL PROTECTED] | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver Crash!

2008-11-02 Thread Rami Jadaa
Hello all,
Thanks for your support..
I am writing here the last updates, as -I think- we know what was crashing
the AOLserver:
1- In all  AOLserver machines, is procedure is run on the fly to track user
actions through the site (Like registrations, log in...moving around between
links).
2- This procedure writes these operations to InnoDB tables in MySQL, and it
runs in both, crashing AOLserver and non-crashing ones.
3- As I said before, the crashing AOLserver is the only one serving multiple
sites.
4- Bearing in mind Nuemann advice about nsmysql, we disabled the action
tracking procedure to prevent further writes to MySQL.
5- We applied a reload..and..BANG.. "no" Crash happened!
6- Hence the conclusion that we had is that it _something_ to do with
nsmysql -especially that no further updates is done on that module.

So any idea what could be there in nsmysql that would crash AOLserver?

Thank you all!

On Fri, Oct 31, 2008 at 2:58 AM, SUBSCRIBE AOLSERVER Bjoern Kiesbye <
[EMAIL PROTECTED]> wrote:

>  Hello,
>
> I disoverd problems with Aolserver when tcl (8.5) is configured with the
> option --enable-64-bit , if I configure tcl with out this option, Aolserver
> runs fine on a  OpenSuse 10.3  64-bit System. If you are using a precompiled
> tcl package for a 64-bit System, it is likely that tcl was configured with
> this option.
>
> You wrote you are using Aolserver 4.0.10, as far as i know its an older
> release of Aolserver 4 , I'm using Aolserver 4.5.0 upgrading may solve your
> problem.
> I think v40_r10 (the version you are using) is the tag which is pointed out
> in the openacs installation documentation, the current release is v45_r0 , I
> think. .
>
> good luck,
>
> Bjoern
>
>  -Ursprüngliche Mitteilung-
> Von: Juan José del Río <[EMAIL PROTECTED]>
> An: AOLSERVER@LISTSERV.AOL.COM
> Verschickt: Mi., 29. Okt. 2008, 12:26
> Thema: Re: [AOLSERVER] AOLserver Crash!
>
>
>  Hello,
>
>
> >From my experience, I think the problem may be related with the 64 bits.
>
>
> I've servers with AOLServer 32 bits, and AOLServer 64 bits, and I have
>
> seen 64 bits growing faster in memory (and even not decreasing through
>
> time), until it takes a considerable amount of memory (then I have to
>
> restart it). Also, around each 2-3 hours, AOLServer will go wild and eat
>
> 100% of one CPU core for around 1 minute... but will continue serving
>
> requests slower than usual.
>
>
> My 32 bits server is a FreeBSD 7, and my 64 bits server is an up-to-date
>
> Debian Linux. I don't know if it has something to do with the OS or with
>
> the 32/64 bits, but the fact is that my Debian Linux 64 bits gives
>
> problems that the FreeBSD 32 bits doesn't give.
>
>
>
> Regards,
>
>
>   Juan José
>
>
> -
>
> Juan José del Río|
>
> (+34) 616 512 340|  [EMAIL PROTECTED]
>
>
>
> Simple Option S.L.
>
>   Tel: (+34) 951 930 122
>
>   Fax: (+34) 951 930 122
>
>   http://www.simpleoption.com
>
>
>
> On Wed, 2008-10-29 at 06:53 -0400, Scott Goodwin wrote:
>
> > It appears that you have the same problem in all of your servers; the
>
> > goal is to find out what part of the code is failing and under what
>
> > conditions. Three things stand out: failed servers are under a heavier
>
> > load than those that don't exhibit the failure; the failure happens
>
> > shortly after shifting a load via pound onto an already running
>
> > aolserver instance; the failure happens after a reload of your procs
>
> > on that server. Since you've said that none of the loads is heavy I
>
> > don't think this problem is triggered by aolserver being overwhelmed
>
> > with traffic. This leaves two things: the shifting of the load itself
>
> > to an already running aolserver instance and the reloading of the
>
> > procs on that aolserver instance. I suspect the problem is related to
>
> > reloading the procs with ns_eval, and not due to load shifting or load
>
> > volume, but we need to confirm that.
>
> >
>
> >
>
> > Is there a way you can run an aolserver instance directly answering
>
> > queries without using pound? Maybe you could set up a test server that
>
> > you then use http_load or apache bench on. Once running, hit it with a
>
> > load and see if it stays up for at least 10-20 minutes. If it does, do
>
> > a reload of your procs on that server without doing anything else --
>
> > what I expect is that the aolserver instance will crash shortly after
>
> > doing the proc reload. You can then restart the server and try it
>
> > again, this time 

Re: [AOLSERVER] AOLserver Crash!

2008-10-30 Thread SUBSCRIBE AOLSERVER Bjoern Kiesbye

 Hello,

I disoverd problems with Aolserver when tcl (8.5) is configured with the option 
--enable-64-bit , if I configure tcl with out this option, Aolserver runs fine 
on a  OpenSuse 10.3  64-bit System. If you are using a precompiled tcl package 
for a 64-bit System, it is likely that tcl was configured with this option. 

You wrote you are using Aolserver 4.0.10, as far as i know its an older release 
of Aolserver 4 , I'm using Aolserver 4.5.0 upgrading may solve your problem.
I think v40_r10 (the version you are using) is the tag which is pointed out in 
the openacs installation documentation, the current release is v45_r0 , I 
think. .     


 
good luck,

Bjoern


 

-Ursprüngliche Mitteilung- 
Von: Juan José del Río <[EMAIL PROTECTED]>
An: AOLSERVER@LISTSERV.AOL.COM
Verschickt: Mi., 29. Okt. 2008, 12:26
Thema: Re: [AOLSERVER] AOLserver Crash!










Hello,

>From my experience, I think the problem may be related with the 64 bits.

I've servers with AOLServer 32 bits, and AOLServer 64 bits, and I have
seen 64 bits growing faster in memory (and even not decreasing through
time), until it takes a considerable amount of memory (then I have to
restart it). Also, around each 2-3 hours, AOLServer will go wild and eat
100% of one CPU core for around 1 minute... but will continue serving
requests slower than usual.

My 32 bits server is a FreeBSD 7, and my 64 bits server is an up-to-date
Debian Linux. I don't know if it has something to do with the OS or with
the 32/64 bits, but the fact is that my Debian Linux 64 bits gives
problems that the FreeBSD 32 bits doesn't give.


Regards,

  Juan José

-  
Juan José del Río|  
(+34) 616 512 340|  [EMAIL PROTECTED]


Simple Option S.L.
  Tel: (+34) 951 930 122
  Fax: (+34) 951 930 122
  http://www.simpleoption.com


On Wed, 2008-10-29 at 06:53 -0400, Scott Goodwin wrote:
> It appears that you have the same problem in all of your servers; the
> goal is to find out what part of the code is failing and under what
> conditions. Three things stand out: failed servers are under a heavier
> load than those that don't exhibit the failure; the failure happens
> shortly after shifting a load via pound onto an already running
> aolserver instance; the failure happens after a reload of your procs
> on that server. Since you've said that none of the loads is heavy I
> don't think this problem is triggered by aolserver being overwhelmed
> with traffic. This leaves two things: the shifting of the load itself
> to an already running aolserver instance and the reloading of the
> procs on that aolserver instance. I suspect the problem is related to
> reloading the procs with ns_eval, and not due to load shifting or load
> volume, but we need to confirm that. 
> 
> 
> Is there a way you can run an aolserver instance directly answering
> queries without using pound? Maybe you could set up a test server that
> you then use http_load or apache bench on. Once running, hit it with a
> load and see if it stays up for at least 10-20 minutes. If it does, do
> a reload of your procs on that server without doing anything else --
> what I expect is that the aolserver instance will crash shortly after
> doing the proc reload. You can then restart the server and try it
> again, this time reloading the procs immediately. Then repeat, but
> reload the procs after 5 minutes or so. In each case, determine how
> long it takes the server to crash after the proc reload (make sure the
> aolserver instance has started and continues to server connections
> before, during and after the reload).
> 
> 
> If anyone else is experiencing the same problems, please post your
> information along with your configuration.
> 
> 
> /s.
> 
> On Oct 29, 2008, at 5:29 AM, Rami Jadaa wrote:
> 
> > Hi Scott,
> > Thanks for your reply.
> > 
> > I don't think that I can send the log as it will be so big , as
> > AOlserver initiates and load a lot of ACS code...
> > 
> > And for the checksum, we did the following: 
> > Using pound, we shifted the load going to this webserver to another
> > server on another machine where it uses a different local copy of
> > the same application, and then after the reload, the server were we
> > shifted the load to crashed, and the old one didn't!!
> > So i can take out he doubt on file corruption, right?
> > 
> > 
> > On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]>
> > wrote:
> > Rami,
> > 
> > 
> > Tcl is attempting to create a new hash table entry on a hash
> > table that was either never created or was created but has
> > ceased to exist -- most likely the pointer to that hash
> > tabl

Re: [AOLSERVER] AOLserver Crash!

2008-10-29 Thread Gustaf Neumann

Rami,

it looks to me as if the problem is due to a c-extension you are using
and happens after a thread exit. When a thread exists, it frees among
other things the associated tcl interpreter. At this time, all c
extensions have to unload cleanly as well. Note that ns_eval creates
and destroys as well a thread. The problem might be due to some leftover,
due to uninitialized memory, etc. This is inherently cumbersome to
debug in C code, since sometimes, you might be "lucky" and
the server survives the real bug, some other times not. Furthermore,
the bug and the crash  are often in two different parts of the code.
Maybe, you have simply "luck" on the other machines.

i would recommend the following:
a) try to make the crash happen reproducible, in as simple as
   possible setup. I would recommend to stress thread destroys
   (e.g. setting maxconnections  to e.g. 2, test with calls
   doing an ns_eval)
b) reduce all c-extensions (do you have to use nsmysql and nsoracle?)
   In the best of all possible worlds, you might not need all
   c-modules in the crash case, so dropping might help
   to detect the culprit.

From some distant, my first suspicion falls to nsmysql, i am not
sure how frequent this is used.

hope this help a little
-gustaf neumann


Rami Jadaa schrieb:

Hi Scott,
Thanks for your reply.

I don't think that I can send the log as it will be so big , as 
AOlserver initiates and load a lot of ACS code...


And for the checksum, we did the following:
Using pound, we shifted the load going to this webserver to another 
server on another machine where it uses a different local copy of the 
same application, and then after the reload, the server were we 
shifted the load to crashed, and the old one didn't!!

So i can take out he doubt on file corruption, right?





Environment :
Aolserver 4.0.10 , fetched from CVS almost 6 months back .
nsoracle Oracle Driver version 2.8a1
nsmysql CVS
Oracle 10gR2  Libraries
AMD x86_64 RHEL 4
Curently tcl 8.4.16 also tried tcl 8.4.11


Please help as this is driving me crazy :(

Thanks in advance

-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to
<[EMAIL PROTECTED] >
with the body of "SIGNOFF AOLSERVER" in the email message. You
can leave the Subject: field of your email blank.



-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to
<[EMAIL PROTECTED] >
with the body of "SIGNOFF AOLSERVER" in the email message. You can
leave the Subject: field of your email blank.


-- AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in 
the email message. You can leave the Subject: field of your email blank.





--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver Crash!

2008-10-29 Thread Scott Goodwin

Hi Juan,

That's a good point. I noticed Rami was hosting on 64 bit AMD systems  
and it is possible that if he were running on a 32 bit architecture  
the problems he's experiencing might not surface. This could mean a  
problem with Tcl on 64 bit or something specific to AOLserver. I  
suspect there are several 64 bit issues with AOLserver that need to be  
resolved. I'll look into purchasing some 64 bit hardware soon to test  
with.


/s.

On Oct 29, 2008, at 7:26 AM, Juan José del Río wrote:


Hello,

From my experience, I think the problem may be related with the 64  
bits.


I've servers with AOLServer 32 bits, and AOLServer 64 bits, and I have
seen 64 bits growing faster in memory (and even not decreasing through
time), until it takes a considerable amount of memory (then I have to
restart it). Also, around each 2-3 hours, AOLServer will go wild and  
eat

100% of one CPU core for around 1 minute... but will continue serving
requests slower than usual.

My 32 bits server is a FreeBSD 7, and my 64 bits server is an up-to- 
date
Debian Linux. I don't know if it has something to do with the OS or  
with

the 32/64 bits, but the fact is that my Debian Linux 64 bits gives
problems that the FreeBSD 32 bits doesn't give.


Regards,

 Juan José

-
Juan José del Río|
(+34) 616 512 340|  [EMAIL PROTECTED]


Simple Option S.L.
 Tel: (+34) 951 930 122
 Fax: (+34) 951 930 122
 http://www.simpleoption.com


On Wed, 2008-10-29 at 06:53 -0400, Scott Goodwin wrote:

It appears that you have the same problem in all of your servers; the
goal is to find out what part of the code is failing and under what
conditions. Three things stand out: failed servers are under a  
heavier

load than those that don't exhibit the failure; the failure happens
shortly after shifting a load via pound onto an already running
aolserver instance; the failure happens after a reload of your procs
on that server. Since you've said that none of the loads is heavy I
don't think this problem is triggered by aolserver being overwhelmed
with traffic. This leaves two things: the shifting of the load itself
to an already running aolserver instance and the reloading of the
procs on that aolserver instance. I suspect the problem is related to
reloading the procs with ns_eval, and not due to load shifting or  
load

volume, but we need to confirm that.


Is there a way you can run an aolserver instance directly answering
queries without using pound? Maybe you could set up a test server  
that
you then use http_load or apache bench on. Once running, hit it  
with a
load and see if it stays up for at least 10-20 minutes. If it does,  
do

a reload of your procs on that server without doing anything else --
what I expect is that the aolserver instance will crash shortly after
doing the proc reload. You can then restart the server and try it
again, this time reloading the procs immediately. Then repeat, but
reload the procs after 5 minutes or so. In each case, determine how
long it takes the server to crash after the proc reload (make sure  
the

aolserver instance has started and continues to server connections
before, during and after the reload).


If anyone else is experiencing the same problems, please post your
information along with your configuration.


/s.

On Oct 29, 2008, at 5:29 AM, Rami Jadaa wrote:


Hi Scott,
Thanks for your reply.

I don't think that I can send the log as it will be so big , as
AOlserver initiates and load a lot of ACS code...

And for the checksum, we did the following:
Using pound, we shifted the load going to this webserver to another
server on another machine where it uses a different local copy of
the same application, and then after the reload, the server were we
shifted the load to crashed, and the old one didn't!!
So i can take out he doubt on file corruption, right?


On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]>
wrote:
   Rami,


   Tcl is attempting to create a new hash table entry on a hash
   table that was either never created or was created but has
   ceased to exist -- most likely the pointer to that hash
   table is null or corrupted. This could be something in
   AOLserver that uses the Tcl_Hash* API. First steps:


   1. Send a copy of the nslog output for a clean startup
   through to the point where it crashes; that might indicate
   where it's getting fouled up. If that portion of the nslog
   is not very long (say no more than 100-150 lines) you can
   cut and paste into the message; otherwise attach it as a
   separate file (but limit it to the smallest necessary size
   -- don't want multimegabyte files).


   2. Do a checksum of all your own Tcl code files used by
   AOLserver on a known good machine and those same Tcl files
   on the bad one; compare the two outputs to see what Tcl
   files on the bad machine differ from the good one.
   Investigate those differences.


   /s.




   On Oc

Re: [AOLSERVER] AOLserver Crash!

2008-10-29 Thread Juan José del Río
Hello,

>From my experience, I think the problem may be related with the 64 bits.

I've servers with AOLServer 32 bits, and AOLServer 64 bits, and I have
seen 64 bits growing faster in memory (and even not decreasing through
time), until it takes a considerable amount of memory (then I have to
restart it). Also, around each 2-3 hours, AOLServer will go wild and eat
100% of one CPU core for around 1 minute... but will continue serving
requests slower than usual.

My 32 bits server is a FreeBSD 7, and my 64 bits server is an up-to-date
Debian Linux. I don't know if it has something to do with the OS or with
the 32/64 bits, but the fact is that my Debian Linux 64 bits gives
problems that the FreeBSD 32 bits doesn't give.


Regards,

  Juan José

-  
Juan José del Río|  
(+34) 616 512 340|  [EMAIL PROTECTED]


Simple Option S.L.
  Tel: (+34) 951 930 122
  Fax: (+34) 951 930 122
  http://www.simpleoption.com


On Wed, 2008-10-29 at 06:53 -0400, Scott Goodwin wrote:
> It appears that you have the same problem in all of your servers; the
> goal is to find out what part of the code is failing and under what
> conditions. Three things stand out: failed servers are under a heavier
> load than those that don't exhibit the failure; the failure happens
> shortly after shifting a load via pound onto an already running
> aolserver instance; the failure happens after a reload of your procs
> on that server. Since you've said that none of the loads is heavy I
> don't think this problem is triggered by aolserver being overwhelmed
> with traffic. This leaves two things: the shifting of the load itself
> to an already running aolserver instance and the reloading of the
> procs on that aolserver instance. I suspect the problem is related to
> reloading the procs with ns_eval, and not due to load shifting or load
> volume, but we need to confirm that. 
> 
> 
> Is there a way you can run an aolserver instance directly answering
> queries without using pound? Maybe you could set up a test server that
> you then use http_load or apache bench on. Once running, hit it with a
> load and see if it stays up for at least 10-20 minutes. If it does, do
> a reload of your procs on that server without doing anything else --
> what I expect is that the aolserver instance will crash shortly after
> doing the proc reload. You can then restart the server and try it
> again, this time reloading the procs immediately. Then repeat, but
> reload the procs after 5 minutes or so. In each case, determine how
> long it takes the server to crash after the proc reload (make sure the
> aolserver instance has started and continues to server connections
> before, during and after the reload).
> 
> 
> If anyone else is experiencing the same problems, please post your
> information along with your configuration.
> 
> 
> /s.
> 
> On Oct 29, 2008, at 5:29 AM, Rami Jadaa wrote:
> 
> > Hi Scott,
> > Thanks for your reply.
> > 
> > I don't think that I can send the log as it will be so big , as
> > AOlserver initiates and load a lot of ACS code...
> > 
> > And for the checksum, we did the following: 
> > Using pound, we shifted the load going to this webserver to another
> > server on another machine where it uses a different local copy of
> > the same application, and then after the reload, the server were we
> > shifted the load to crashed, and the old one didn't!!
> > So i can take out he doubt on file corruption, right?
> > 
> > 
> > On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]>
> > wrote:
> > Rami,
> > 
> > 
> > Tcl is attempting to create a new hash table entry on a hash
> > table that was either never created or was created but has
> > ceased to exist -- most likely the pointer to that hash
> > table is null or corrupted. This could be something in
> > AOLserver that uses the Tcl_Hash* API. First steps:
> > 
> > 
> > 1. Send a copy of the nslog output for a clean startup
> > through to the point where it crashes; that might indicate
> > where it's getting fouled up. If that portion of the nslog
> > is not very long (say no more than 100-150 lines) you can
> > cut and paste into the message; otherwise attach it as a
> > separate file (but limit it to the smallest necessary size
> > -- don't want multimegabyte files).
> > 
> > 
> > 2. Do a checksum of all your own Tcl code files used by
> > AOLserver on a known good machine and those same Tcl files
> > on the bad one; compare the two outputs to see what Tcl
> > files on the bad machine differ from the good one.
> > Investigate those differences.
> > 
> > 
> > /s.
> > 
> > 
> > 
> > 
> > On Oct 28, 2008, at 10:48 AM, Rami Jadaa wrote:
> > 
> > 
> > > 
> > > Hello Everyone,
> > > 
> >

Re: [AOLSERVER] AOLserver Crash!

2008-10-29 Thread Scott Goodwin
It appears that you have the same problem in all of your servers; the  
goal is to find out what part of the code is failing and under what  
conditions. Three things stand out: failed servers are under a heavier  
load than those that don't exhibit the failure; the failure happens  
shortly after shifting a load via pound onto an already running  
aolserver instance; the failure happens after a reload of your procs  
on that server. Since you've said that none of the loads is heavy I  
don't think this problem is triggered by aolserver being overwhelmed  
with traffic. This leaves two things: the shifting of the load itself  
to an already running aolserver instance and the reloading of the  
procs on that aolserver instance. I suspect the problem is related to  
reloading the procs with ns_eval, and not due to load shifting or load  
volume, but we need to confirm that.


Is there a way you can run an aolserver instance directly answering  
queries without using pound? Maybe you could set up a test server that  
you then use http_load or apache bench on. Once running, hit it with a  
load and see if it stays up for at least 10-20 minutes. If it does, do  
a reload of your procs on that server without doing anything else --  
what I expect is that the aolserver instance will crash shortly after  
doing the proc reload. You can then restart the server and try it  
again, this time reloading the procs immediately. Then repeat, but  
reload the procs after 5 minutes or so. In each case, determine how  
long it takes the server to crash after the proc reload (make sure the  
aolserver instance has started and continues to server connections  
before, during and after the reload).


If anyone else is experiencing the same problems, please post your  
information along with your configuration.


/s.

On Oct 29, 2008, at 5:29 AM, Rami Jadaa wrote:


Hi Scott,
Thanks for your reply.

I don't think that I can send the log as it will be so big , as  
AOlserver initiates and load a lot of ACS code...


And for the checksum, we did the following:
Using pound, we shifted the load going to this webserver to another  
server on another machine where it uses a different local copy of  
the same application, and then after the reload, the server were we  
shifted the load to crashed, and the old one didn't!!

So i can take out he doubt on file corruption, right?


On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]>  
wrote:

Rami,

Tcl is attempting to create a new hash table entry on a hash table  
that was either never created or was created but has ceased to exist  
-- most likely the pointer to that hash table is null or corrupted.  
This could be something in AOLserver that uses the Tcl_Hash* API.  
First steps:


1. Send a copy of the nslog output for a clean startup through to  
the point where it crashes; that might indicate where it's getting  
fouled up. If that portion of the nslog is not very long (say no  
more than 100-150 lines) you can cut and paste into the message;  
otherwise attach it as a separate file (but limit it to the smallest  
necessary size -- don't want multimegabyte files).


2. Do a checksum of all your own Tcl code files used by AOLserver on  
a known good machine and those same Tcl files on the bad one;  
compare the two outputs to see what Tcl files on the bad machine  
differ from the good one. Investigate those differences.


/s.


On Oct 28, 2008, at 10:48 AM, Rami Jadaa wrote:


Hello Everyone,

We are running multiple instances of AOLserver on different  
machines, and I am enjoying the reload functionality to reload the  
proc libraries using ns_eval source {fileName} in each one of them...


However, one of the AOLservers crashes after few minutes from the  
reload.


The strange thing is that this is the only AOLserver that crashes,  
while others don't!!! and I noticed that just before the crash, the  
following error happens (which means something in the C breaks, and  
I am assuming that it could be in the TCL interpter, Curently tcl  
8.4.16  ( not AOLserver...But this is only an assumption):


"called Tcl_CreateHashEntry on deleted table"

We use this  server to serve multiple domains and have a  pound  
load balancer in the front  , For example if the request comes for www.xyz.com 
 we serve xyz service related site and contents and if the request  
comes for www.abc.com we serve abc related contents and site. In  
total we are serving around 25 different sites like this . We are  
not using any virtual hosting module or feature of Aolserver . The  
total traffic of the server is not high .


Any idea anybody!!! Have anyone using the reload functionality  
noticed that it could crash the AOLserver?


Environment :
Aolserver 4.0.10 , fetched from CVS almost 6 months back .
nsoracle Oracle Driver version 2.8a1
nsmysql CVS
Oracle 10gR2  Libraries
AMD x86_64 RHEL 4
Curently tcl 8.4.16 also tried tcl 8.4.11


Please help as this is driving me crazy :(

Thanks in a

Re: [AOLSERVER] AOLserver Crash!

2008-10-29 Thread Rami Jadaa
Hi Scott,
Thanks for your reply.

I don't think that I can send the log as it will be so big , as AOlserver
initiates and load a lot of ACS code...

And for the checksum, we did the following:
Using pound, we shifted the load going to this webserver to another server
on another machine where it uses a different local copy of the same
application, and then after the reload, the server were we shifted the load
to crashed, and the old one didn't!!
So i can take out he doubt on file corruption, right?


On Tue, Oct 28, 2008 at 7:50 PM, Scott Goodwin <[EMAIL PROTECTED]> wrote:

> Rami,
> Tcl is attempting to create a new hash table entry on a hash table that was
> either never created or was created but has ceased to exist -- most likely
> the pointer to that hash table is null or corrupted. This could be something
> in AOLserver that uses the Tcl_Hash* API. First steps:
>
> 1. Send a copy of the nslog output for a clean startup through to the point
> where it crashes; that might indicate where it's getting fouled up. If that
> portion of the nslog is not very long (say no more than 100-150 lines) you
> can cut and paste into the message; otherwise attach it as a separate file
> (but limit it to the smallest necessary size -- don't want multimegabyte
> files).
>
> 2. Do a checksum of all your own Tcl code files used by AOLserver on a
> known good machine and those same Tcl files on the bad one; compare the two
> outputs to see what Tcl files on the bad machine differ from the good one.
> Investigate those differences.
>
> /s.
>
>
> On Oct 28, 2008, at 10:48 AM, Rami Jadaa wrote:
>
> Hello Everyone,
>
> We are running multiple instances of AOLserver on different machines, and I
> am enjoying the reload functionality to reload the proc libraries using
> ns_eval source {fileName} in each one of them...
>
> However, one of the AOLservers crashes after few minutes from the reload.
>
> The strange thing is that this is the only AOLserver that crashes, while
> others don't!!! and I noticed that just before the crash, the following
> error happens (which means something in the C breaks, and I am assuming that
> it could be in the TCL interpter, Curently tcl 8.4.16  ( not
> AOLserver...But this is only an assumption):
>
> "called Tcl_CreateHashEntry on deleted table"
>
> We use this  server to serve multiple domains and have a  pound load
> balancer in the front  , For example if the request comes for www.xyz.comwe 
> serve xyz service related site and contents and if the request comes for
> www.abc.com we serve abc related contents and site. In total we are
> serving around 25 different sites like this . We are not using any virtual
> hosting module or feature of Aolserver . The total traffic of the server is
> not high .
>
> Any idea anybody!!! Have anyone using the reload functionality noticed that
> it could crash the AOLserver?
>
> Environment :
> Aolserver 4.0.10 , fetched from CVS almost 6 months back .
> nsoracle Oracle Driver version 2.8a1
> nsmysql CVS
> Oracle 10gR2  Libraries
> AMD x86_64 RHEL 4
> Curently tcl 8.4.16 also tried tcl 8.4.11
>
>
> Please help as this is driving me crazy :(
>
> Thanks in advance
>
>
> --
> AOLserver - http://www.aolserver.com/
>
>
> To Remove yourself from this list, simply send an email to <[EMAIL 
> PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
>
>
>
> --
> AOLserver - http://www.aolserver.com/
>
>
> To Remove yourself from this list, simply send an email to <[EMAIL 
> PROTECTED]> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.
>
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver Crash!

2008-10-28 Thread Scott Goodwin

Rami,

Tcl is attempting to create a new hash table entry on a hash table  
that was either never created or was created but has ceased to exist  
-- most likely the pointer to that hash table is null or corrupted.  
This could be something in AOLserver that uses the Tcl_Hash* API.  
First steps:


1. Send a copy of the nslog output for a clean startup through to the  
point where it crashes; that might indicate where it's getting fouled  
up. If that portion of the nslog is not very long (say no more than  
100-150 lines) you can cut and paste into the message; otherwise  
attach it as a separate file (but limit it to the smallest necessary  
size -- don't want multimegabyte files).


2. Do a checksum of all your own Tcl code files used by AOLserver on a  
known good machine and those same Tcl files on the bad one; compare  
the two outputs to see what Tcl files on the bad machine differ from  
the good one. Investigate those differences.


/s.


On Oct 28, 2008, at 10:48 AM, Rami Jadaa wrote:


Hello Everyone,

We are running multiple instances of AOLserver on different  
machines, and I am enjoying the reload functionality to reload the  
proc libraries using ns_eval source {fileName} in each one of them...


However, one of the AOLservers crashes after few minutes from the  
reload.


The strange thing is that this is the only AOLserver that crashes,  
while others don't!!! and I noticed that just before the crash, the  
following error happens (which means something in the C breaks, and  
I am assuming that it could be in the TCL interpter, Curently tcl  
8.4.16  ( not AOLserver...But this is only an assumption):


"called Tcl_CreateHashEntry on deleted table"

We use this  server to serve multiple domains and have a  pound load  
balancer in the front  , For example if the request comes for www.xyz.com 
 we serve xyz service related site and contents and if the request  
comes for www.abc.com we serve abc related contents and site. In  
total we are serving around 25 different sites like this . We are  
not using any virtual hosting module or feature of Aolserver . The  
total traffic of the server is not high .


Any idea anybody!!! Have anyone using the reload functionality  
noticed that it could crash the AOLserver?


Environment :
Aolserver 4.0.10 , fetched from CVS almost 6 months back .
nsoracle Oracle Driver version 2.8a1
nsmysql CVS
Oracle 10gR2  Libraries
AMD x86_64 RHEL 4
Curently tcl 8.4.16 also tried tcl 8.4.11


Please help as this is driving me crazy :(

Thanks in advance

--
AOLserver - http://www.aolserver.com/


To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED] 
> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the  
Subject: field of your email blank.






--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


[AOLSERVER] AOLserver Crash!

2008-10-28 Thread Rami Jadaa
Hello Everyone,

We are running multiple instances of AOLserver on different machines, and I
am enjoying the reload functionality to reload the proc libraries using
ns_eval source {fileName} in each one of them...

However, one of the AOLservers crashes after few minutes from the reload.

The strange thing is that this is the only AOLserver that crashes, while
others don't!!! and I noticed that just before the crash, the following
error happens (which means something in the C breaks, and I am assuming that
it could be in the TCL interpter, Curently tcl 8.4.16  ( not AOLserver...But
this is only an assumption):

"called Tcl_CreateHashEntry on deleted table"

We use this  server to serve multiple domains and have a  pound load
balancer in the front  , For example if the request comes for www.xyz.com we
serve xyz service related site and contents and if the request comes for
www.abc.com we serve abc related contents and site. In total we are serving
around 25 different sites like this . We are not using any virtual hosting
module or feature of Aolserver . The total traffic of the server is not high
.

Any idea anybody!!! Have anyone using the reload functionality noticed that
it could crash the AOLserver?

Environment :
Aolserver 4.0.10 , fetched from CVS almost 6 months back .
nsoracle Oracle Driver version 2.8a1
nsmysql CVS
Oracle 10gR2  Libraries
AMD x86_64 RHEL 4
Curently tcl 8.4.16 also tried tcl 8.4.11


Please help as this is driving me crazy :(

Thanks in advance


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver crash related to ns_atclose and namespace commands

2007-01-22 Thread Tom Jackson
On Sunday 21 January 2007 23:20, Brett Schwarz wrote:
> That's funny actually...I just changed a bunch of these cases in a Tcl
> extension I help maintain, just earlier today. I happened upon this post
> that talks about it:
> http://sourceforge.net/mailarchive/forum.php?thread_id=30611212&forum_id=43
>966
>
> Might be worthwhile doing an audit of the rest of the aolserver code for
> these occurances.

I only found a few in the AOLserver code, I changed about half before I found 
the one that stopped the bug. 

I even changed one in the tcl codebase that uses this while checking if a 
namespace exists. 

I have a feeling that the bug shows up for some other reason. ns_atclose 
stores scripts and uses a hash array. I'm guessing that two identical scripts 
might appear as one at some point. This could change the reference count for 
the object, somehow leading to the problem.

tom jackson


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] AOLserver crash related to ns_atclose and namespace commands

2007-01-21 Thread Brett Schwarz
That's funny actually...I just changed a bunch of these cases in a Tcl 
extension I help maintain, just earlier today. I happened upon this post that 
talks about it: 
 http://sourceforge.net/mailarchive/forum.php?thread_id=30611212&forum_id=43966

Might be worthwhile doing an audit of the rest of the aolserver code for these 
occurances.

--brett

- Original Message 
From: Tom Jackson <[EMAIL PROTECTED]>
To: AOLSERVER@LISTSERV.AOL.COM
Sent: Sunday, January 21, 2007 7:17:41 PM
Subject: Re: [AOLSERVER] AOLserver crash related to ns_atclose and namespace 
commands

I found the following change fixes the bug:

in nsd/tclresp.c, line 840:

static int
Result(Tcl_Interp *interp, int result)
{
   /* Tcl_SetBooleanObj(Tcl_GetObjResult(interp), result == NS_OK ? 1 : 0); */
Tcl_SetObjResult(interp, Tcl_NewBooleanObj((result == NS_OK ? 1 : 0)));
return TCL_OK;
}

I'll commit the change.

tom jackson


On Sunday 21 January 2007 17:06, Tom Jackson wrote:
> Okay, some more info on this.
>
> ns_atclose has been changed in some strange ways.
>
> First it now requires that you are in an open connection to invoke
> ns_atclose.
>
> ns_atclose used to execute in scheduled procs, which makes sense so that
> you can use one method to clean up stuff in case of errors.
>
> It is easy to re-enable adding ns_atclose to scheduled procs by removing a
> few lines of code. Now I can call ns_atclose everywhere, but in scheduled
> procs, the cleanup scripts don't run.
>
> Question is: why the (silent) change, and
> is there something to replace this?
>
> The old description of the command is here:
> <http://rmadilo.com/files/nsapi/ns_atclose.html>
>
> I still haven't figured out where exactly the crash is coming from, but _it
> is not in the NsAtCloseObjCmd or NsRunAtClose... code.
>
> tom jackson
>
> On Sunday 21 January 2007 11:24, Tom Jackson wrote:
> > I have been getting some crashes in AOLserver (current cvs version).
> > AOLserver doesn't exit, but prints the following and stops responding:
> >
> > 'Tcl_SetBooleanObj called with shared object'
> >
> > Here is a tcl page which exposes the behavior:
> >
> > ---
> > # Script to expose bug with ns_atclose/namespace commands
> > set store "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnop"
> > namespace eval ::bug { }
> >
> > # Commenting out this line leads to bug: 'Tcl_SetBooleanObj called with
> > shared object'
> > #namespace eval ::bug::$store { }
> >
> > proc ::bug::atClose { store } {
> > ns_log Debug "checking if namespace ::bug::$store exists"
> > if {[namespace exists ::bug::${store}]} {
> > ns_log Debug "Deleting namespace ::bug::$store"
> > namespace delete ::bug::${store}
> > #log Notice "Closed store (memory delete) $store"
> > return $store
> > } else {
> > ns_log Debug "namespace ::bug::$store does not exist"
> > }
> >
> > }
> >
> > # Comment out one of these and things work fine:
> > ns_atclose ::bug::atClose $store
> > #ns_atclose ::bug::atClose $store
> >
> >
> > ns_return 200 text/plain "ns_atclose bug"
> >
> > -
> >
> > The bug doesn't show up under all conditions. If the namespace exists, or
> > had existed and was deleted, things work as expected. Also, even if the
> > namespace never existed, if ns_atclose is only called once, things work
> > as expected.
> >
> > However, if the namespace to be deleted never existed, and ns_atclose is
> > called twice with the same args, none of the ns_log Debug statements
> > print, and the crash occurs. (But the page is returned)
> >
> > Not sure what is the cause.
> >
> > tom jackson
> >
> > On Friday 03 November 2006 10:31, Alex wrote:
> > > Oh, well
> > >
> > > so I guess it was too early to celebrate. Now I am getting the same
> > > crashes again, even without "exit" command in the tcl code executed in
> > > thread.
> > >
> > > Seems to me that the same problem now discussed in
> > > bug 1589968
> > > https://sourceforge.net/tracker/?func=detail&atid=103152&aid=1589968&gr
> > >ou p_ id=3152
> > >
> > > and
> > >
> > > bug 1582671
> > > http://sourceforge.net/tracker/?func=detail&atid=110894&aid=1582671&gro
> > >up _i d=10894
> > >
> > >
> > > Thanks,
> > > ~ Alex.
> > >
> > > On 11/1/06, Ale

Re: [AOLSERVER] AOLserver crash related to ns_atclose and namespace commands

2007-01-21 Thread Tom Jackson
I found the following change fixes the bug:

in nsd/tclresp.c, line 840:

static int
Result(Tcl_Interp *interp, int result)
{
   /* Tcl_SetBooleanObj(Tcl_GetObjResult(interp), result == NS_OK ? 1 : 0); */
Tcl_SetObjResult(interp, Tcl_NewBooleanObj((result == NS_OK ? 1 : 0)));
return TCL_OK;
}

I'll commit the change.

tom jackson


On Sunday 21 January 2007 17:06, Tom Jackson wrote:
> Okay, some more info on this.
>
> ns_atclose has been changed in some strange ways.
>
> First it now requires that you are in an open connection to invoke
> ns_atclose.
>
> ns_atclose used to execute in scheduled procs, which makes sense so that
> you can use one method to clean up stuff in case of errors.
>
> It is easy to re-enable adding ns_atclose to scheduled procs by removing a
> few lines of code. Now I can call ns_atclose everywhere, but in scheduled
> procs, the cleanup scripts don't run.
>
> Question is: why the (silent) change, and
> is there something to replace this?
>
> The old description of the command is here:
> 
>
> I still haven't figured out where exactly the crash is coming from, but _it
> is not in the NsAtCloseObjCmd or NsRunAtClose... code.
>
> tom jackson
>
> On Sunday 21 January 2007 11:24, Tom Jackson wrote:
> > I have been getting some crashes in AOLserver (current cvs version).
> > AOLserver doesn't exit, but prints the following and stops responding:
> >
> > 'Tcl_SetBooleanObj called with shared object'
> >
> > Here is a tcl page which exposes the behavior:
> >
> > ---
> > # Script to expose bug with ns_atclose/namespace commands
> > set store "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnop"
> > namespace eval ::bug { }
> >
> > # Commenting out this line leads to bug: 'Tcl_SetBooleanObj called with
> > shared object'
> > #namespace eval ::bug::$store { }
> >
> > proc ::bug::atClose { store } {
> > ns_log Debug "checking if namespace ::bug::$store exists"
> > if {[namespace exists ::bug::${store}]} {
> > ns_log Debug "Deleting namespace ::bug::$store"
> > namespace delete ::bug::${store}
> > #log Notice "Closed store (memory delete) $store"
> > return $store
> > } else {
> > ns_log Debug "namespace ::bug::$store does not exist"
> > }
> >
> > }
> >
> > # Comment out one of these and things work fine:
> > ns_atclose ::bug::atClose $store
> > #ns_atclose ::bug::atClose $store
> >
> >
> > ns_return 200 text/plain "ns_atclose bug"
> >
> > -
> >
> > The bug doesn't show up under all conditions. If the namespace exists, or
> > had existed and was deleted, things work as expected. Also, even if the
> > namespace never existed, if ns_atclose is only called once, things work
> > as expected.
> >
> > However, if the namespace to be deleted never existed, and ns_atclose is
> > called twice with the same args, none of the ns_log Debug statements
> > print, and the crash occurs. (But the page is returned)
> >
> > Not sure what is the cause.
> >
> > tom jackson
> >
> > On Friday 03 November 2006 10:31, Alex wrote:
> > > Oh, well
> > >
> > > so I guess it was too early to celebrate. Now I am getting the same
> > > crashes again, even without "exit" command in the tcl code executed in
> > > thread.
> > >
> > > Seems to me that the same problem now discussed in
> > > bug 1589968
> > > https://sourceforge.net/tracker/?func=detail&atid=103152&aid=1589968&gr
> > >ou p_ id=3152
> > >
> > > and
> > >
> > > bug 1582671
> > > http://sourceforge.net/tracker/?func=detail&atid=110894&aid=1582671&gro
> > >up _i d=10894
> > >
> > >
> > > Thanks,
> > > ~ Alex.
> > >
> > > On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > > > Zoran, Jim
> > > >
> > > > thanks very much for suggestions!
> > > > I think I figured it out.
> > > > The code which was executing in the thread concluded with "exit" tcl
> > > > command. I got it replaced with "return" and it seems not to be
> > > > crashing anymore.
> > > >
> > > > However, it would be probably a good idea to disable/rename "exit"
> > > > for the code executed in threads created by ns_thread. Not sure if
> > > > this shall be submitted as an "enhancement"-level bug.
> > > >
> > > > Thanks,
> > > > ~ Alex.
> > > >
> > > > On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > > > > Jim,
> > > > >
> > > > > I tried in on the command line, seems to be my case :)
> > > > >
> > > > > However, I run aolserver on debian, via /etc/init.d/aolserver,
> > > > > Which basically invokes /usr/lib/aolserver4/bin/nsd.
> > > > > How do I make it use nstclsh instead of tclsh ?
> > > > > I don't see any options for that.
> > > > >
> > > > > Thanks,
> > > > > ~ Alex.
> > > > >
> > > > > On 11/1/06, Jim Davidson <[EMAIL PROTECTED]> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I think this is related to the comment I added to the RELEASE
> > > > > > notes:
> > > > > >
> > > > > > * Loading libnsd into a tclsh and then creating new threads with
> > > > > > the ns_thread command

Re: [AOLSERVER] AOLserver crash related to ns_atclose and namespace commands

2007-01-21 Thread Tom Jackson
Okay, some more info on this.

ns_atclose has been changed in some strange ways.

First it now requires that you are in an open connection to invoke ns_atclose.

ns_atclose used to execute in scheduled procs, which makes sense so that you 
can use one method to clean up stuff in case of errors. 

It is easy to re-enable adding ns_atclose to scheduled procs by removing a few 
lines of code. Now I can call ns_atclose everywhere, but in scheduled procs, 
the cleanup scripts don't run.

Question is: why the (silent) change, and
is there something to replace this?

The old description of the command is here:


I still haven't figured out where exactly the crash is coming from, but _it is 
not in the NsAtCloseObjCmd or NsRunAtClose... code.

tom jackson

On Sunday 21 January 2007 11:24, Tom Jackson wrote:
> I have been getting some crashes in AOLserver (current cvs version).
> AOLserver doesn't exit, but prints the following and stops responding:
>
> 'Tcl_SetBooleanObj called with shared object'
>
> Here is a tcl page which exposes the behavior:
>
> ---
> # Script to expose bug with ns_atclose/namespace commands
> set store "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnop"
> namespace eval ::bug { }
>
> # Commenting out this line leads to bug: 'Tcl_SetBooleanObj called with
> shared object'
> #namespace eval ::bug::$store { }
>
> proc ::bug::atClose { store } {
> ns_log Debug "checking if namespace ::bug::$store exists"
> if {[namespace exists ::bug::${store}]} {
> ns_log Debug "Deleting namespace ::bug::$store"
> namespace delete ::bug::${store}
> #log Notice "Closed store (memory delete) $store"
> return $store
> } else {
> ns_log Debug "namespace ::bug::$store does not exist"
> }
>
> }
>
> # Comment out one of these and things work fine:
> ns_atclose ::bug::atClose $store
> #ns_atclose ::bug::atClose $store
>
>
> ns_return 200 text/plain "ns_atclose bug"
>
> -
>
> The bug doesn't show up under all conditions. If the namespace exists, or
> had existed and was deleted, things work as expected. Also, even if the
> namespace never existed, if ns_atclose is only called once, things work as
> expected.
>
> However, if the namespace to be deleted never existed, and ns_atclose is
> called twice with the same args, none of the ns_log Debug statements print,
> and the crash occurs. (But the page is returned)
>
> Not sure what is the cause.
>
> tom jackson
>
> On Friday 03 November 2006 10:31, Alex wrote:
> > Oh, well
> >
> > so I guess it was too early to celebrate. Now I am getting the same
> > crashes again, even without "exit" command in the tcl code executed in
> > thread.
> >
> > Seems to me that the same problem now discussed in
> > bug 1589968
> > https://sourceforge.net/tracker/?func=detail&atid=103152&aid=1589968&grou
> >p_ id=3152
> >
> > and
> >
> > bug 1582671
> > http://sourceforge.net/tracker/?func=detail&atid=110894&aid=1582671&group
> >_i d=10894
> >
> >
> > Thanks,
> > ~ Alex.
> >
> > On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > > Zoran, Jim
> > >
> > > thanks very much for suggestions!
> > > I think I figured it out.
> > > The code which was executing in the thread concluded with "exit" tcl
> > > command. I got it replaced with "return" and it seems not to be
> > > crashing anymore.
> > >
> > > However, it would be probably a good idea to disable/rename "exit" for
> > > the code executed in threads created by ns_thread. Not sure if this
> > > shall be submitted as an "enhancement"-level bug.
> > >
> > > Thanks,
> > > ~ Alex.
> > >
> > > On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > > > Jim,
> > > >
> > > > I tried in on the command line, seems to be my case :)
> > > >
> > > > However, I run aolserver on debian, via /etc/init.d/aolserver,
> > > > Which basically invokes /usr/lib/aolserver4/bin/nsd.
> > > > How do I make it use nstclsh instead of tclsh ?
> > > > I don't see any options for that.
> > > >
> > > > Thanks,
> > > > ~ Alex.
> > > >
> > > > On 11/1/06, Jim Davidson <[EMAIL PROTECTED]> wrote:
> > > > > Hi,
> > > > >
> > > > > I think this is related to the comment I added to the RELEASE
> > > > > notes:
> > > > >
> > > > > * Loading libnsd into a tclsh and then creating new threads with
> > > > > the ns_thread command will result in a crash when those threads
> > > > > exit. The issues has to do with finalization of the async-cancel
> > > > > context used to support the new "ns_ictl cancel" feature.  This bug
> > > > > is not present when using the "nstclsh" binary.
> > > > >
> > > > >
> > > > > The issue above, where Tcl is initialized before AOLserver by
> > > > > loading libnsd into tclsh, results in Tcl thread local storage
> > > > > being finalized before AOLserver's context which includes a pointer
> > > > > to an async handler.
> > > > >
> > > > > Now, that's not what you're doing here but perhaps TclX is having
> > > > > the same effect.  I haven't lo

[AOLSERVER] AOLserver crash related to ns_atclose and namespace commands

2007-01-21 Thread Tom Jackson
I have been getting some crashes in AOLserver (current cvs version).
AOLserver doesn't exit, but prints the following and stops responding:

'Tcl_SetBooleanObj called with shared object'

Here is a tcl page which exposes the behavior:

---
# Script to expose bug with ns_atclose/namespace commands
set store "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnop"
namespace eval ::bug { }

# Commenting out this line leads to bug: 'Tcl_SetBooleanObj called with shared 
object'
#namespace eval ::bug::$store { }

proc ::bug::atClose { store } {
ns_log Debug "checking if namespace ::bug::$store exists"
if {[namespace exists ::bug::${store}]} {
ns_log Debug "Deleting namespace ::bug::$store"
namespace delete ::bug::${store}
#log Notice "Closed store (memory delete) $store"
return $store
} else {
ns_log Debug "namespace ::bug::$store does not exist"
}

}

# Comment out one of these and things work fine:
ns_atclose ::bug::atClose $store
#ns_atclose ::bug::atClose $store


ns_return 200 text/plain "ns_atclose bug"

-

The bug doesn't show up under all conditions. If the namespace exists, or had 
existed and was deleted, things work as expected. Also, even if the namespace 
never existed, if ns_atclose is only called once, things work as expected.

However, if the namespace to be deleted never existed, and ns_atclose is 
called twice with the same args, none of the ns_log Debug statements print, 
and the crash occurs. (But the page is returned)

Not sure what is the cause.

tom jackson

On Friday 03 November 2006 10:31, Alex wrote:
> Oh, well
>
> so I guess it was too early to celebrate. Now I am getting the same
> crashes again, even without "exit" command in the tcl code executed in
> thread.
>
> Seems to me that the same problem now discussed in
> bug 1589968
> https://sourceforge.net/tracker/?func=detail&atid=103152&aid=1589968&group_
>id=3152
>
> and
>
> bug 1582671
> http://sourceforge.net/tracker/?func=detail&atid=110894&aid=1582671&group_i
>d=10894
>
>
> Thanks,
> ~ Alex.
>
> On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > Zoran, Jim
> >
> > thanks very much for suggestions!
> > I think I figured it out.
> > The code which was executing in the thread concluded with "exit" tcl
> > command. I got it replaced with "return" and it seems not to be crashing
> > anymore.
> >
> > However, it would be probably a good idea to disable/rename "exit" for
> > the code executed in threads created by ns_thread. Not sure if this
> > shall be submitted as an "enhancement"-level bug.
> >
> > Thanks,
> > ~ Alex.
> >
> > On 11/1/06, Alex <[EMAIL PROTECTED]> wrote:
> > > Jim,
> > >
> > > I tried in on the command line, seems to be my case :)
> > >
> > > However, I run aolserver on debian, via /etc/init.d/aolserver,
> > > Which basically invokes /usr/lib/aolserver4/bin/nsd.
> > > How do I make it use nstclsh instead of tclsh ?
> > > I don't see any options for that.
> > >
> > > Thanks,
> > > ~ Alex.
> > >
> > > On 11/1/06, Jim Davidson <[EMAIL PROTECTED]> wrote:
> > > > Hi,
> > > >
> > > > I think this is related to the comment I added to the RELEASE notes:
> > > >
> > > > * Loading libnsd into a tclsh and then creating new threads with
> > > > the ns_thread command will result in a crash when those threads
> > > > exit. The issues has to do with finalization of the async-cancel
> > > > context used to support the new "ns_ictl cancel" feature.  This bug
> > > > is not present when using the "nstclsh" binary.
> > > >
> > > >
> > > > The issue above, where Tcl is initialized before AOLserver by loading
> > > > libnsd into tclsh, results in Tcl thread local storage being
> > > > finalized before AOLserver's context which includes a pointer to an
> > > > async handler.
> > > >
> > > > Now, that's not what you're doing here but perhaps TclX is having the
> > > > same effect.  I haven't looked at TclX for sometime so I can't recall
> > > > what it would be using an async handler for -- perhaps you could dig
> > > > through the code and comment it out as the async handler stuff was
> > > > really designed for Unix signal-related things which aren't common in
> > > > multi-threaded AOLserver.
> > > >
> > > > Alternatively, Tcl could be fixed to avoid freeing itself before
> > > > AOLserver or any other extension.  Unfortunately, that could be a big
> > > > job -- the Tcl core is already riddled with a lot of code to try to
> > > > manage the order of finalization.
> > > >
> > > > -Jim
> > > >
> > > > On Nov 1, 2006, at 5:35 PM, Zoran Vasiljevic wrote:
> > > > > On 01.11.2006, at 23:27, Alex wrote:
> > > > >> Hi,
> > > > >>
> > > > >> I am getting yet another crash in AOLServer 4.5.0.
> > > > >> This time it crashes after exiting from threads started with
> > > > >> "ns_thread begin" or "ns_thread begindetached".
> > > > >>
> > > > >> Any Suggestions?
> > > > >>
> > > > >> Thanks,
> > > > >> ~ Alex.
> > > > >>
> > > > >> Program received signal SIGSEGV, Segmentati