Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to hung

2003-01-24 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to hung





Thanks Andrew for your input.


We use Solaris as well and the AOLserver seems to work fine in any other situations except when ns_mutex comes to play. Here is more details how we are using it.

We use ns_mutex inside a scheduled proc, which writes a cashed array of numbers (counters) to the database. This proc is scheduled for every 5 minutes, to lock that array - so that no other process can manipulate that array at the moment it's being written to db - writes the numbers to db, resets the counters, and then unlock that array using ns_mutex unlock.

Notice that this array is ns_share`ed. While everything seems to function and be happy, after the webserver gets more traffic, then we'll start seeing that all the process that have attempted to access that array, are waiting in the queue. At this stage the nsd process will take most of the CPU usage and the webserver almost doesn't respond the http requests. If we stop the traffic eventually (sometimes after a long time) the server will come back up to a normal operation and the queue will become empty. 

I modified that scheduled proc only to not lock that array (no ns_mutex use), and after making this change, webserver never got in to trouble. That's why I'm almost certain that ns_mutex is causing problems.

I suspect maybe combination of ns_share and ns_mutex on that array might be the cause of this. I also noticed doing upvar on a ns_shared variable doesn't work !

Any more inputs regarding this matter will greatly be appreciated.


Thanks
Seena



-Original Message-
From: Andrew Piskorski
To: [EMAIL PROTECTED]
Sent: 1/23/03 7:11 PM
Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to hung


On Thu, Jan 23, 2003 at 07:23:28PM -0500, Seena wrote:


 After setting up a new server (AOLserver 3.3.1 w/ TCL 8), it seems
using
 the ns_mutex to luck array/list, while serve is running, bring our
site
 down. The same setup and code/application with AOLserver 2.3.3 w/ TCL
7,
 works fine. Any comment why/how this is happening ?

 I've heard we can use ns_rwlock instead of ns_mutex, would anyone
recommand
 replacing ns_mutex with ns_rwlock ?


I've used ns_mutex pretty heavily with AOLserver 3.3+ad13 and Tcl
8.3.2 on Solaris, and I've never had any problems. If your nsd
process is dieing, you must have something broken in your AOLserver,
although I've no idea what. Perhaps someone else here will, so you
should probably post a lot more details: Where you got your AOLserver
code, how you compiled it, what operating system, etc.


I've never used ns_rwlock, so I don't know abou that. What exactly
are you using ns_mutex for? Are you using ns_share? Perhaps you
could avoid having to use ns_mutex at all by using nsv? Or are you
doing something that you REALLY need to us ns_mutex for, like using
ns_cond, or making several separate nsv operations atomic?


Also, you said this problem brings your site down, but in the
subject you said AOLserver is hung? What exactly is the failure
mode? Is your nsd process segfaulting? Or are you just deadlocking
threads such that AOLserver hangs there doing nothing?


--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com





Re: [AOLSERVER] AOLserver 4.0 Beta available!

2003-01-24 Thread Brett Schwarz
I noticed this was not under the NEWS section at www.aolserver.com ... I
didn't know if that was on purpose, or not...

--brett

On Fri, 2003-01-24 at 05:36, Elizabeth Thomas wrote:
 The first beta release of AOLserver 4.0 is now available with CVS tag
 aolserver_v4_r0_beta_1.  It is also available via the Source Forge
 download page:
 https://sourceforge.net/project/showfiles.php?group_id=3152release_id=135781

 Please report bugs via SourceForge at:
 https://sourceforge.net/tracker/?group_id=3152atid=103152

 with group aolserver_v4_r0_beta_1

 Test On!


 Elizabeth Thomas
 Principal Software Engineer
 America OnLine, Inc.
--
Brett Schwarz
brett_schwarz AT yahoo.com



Re: [AOLSERVER] AOLserver 4.0 Beta available!

2003-01-24 Thread Nathan Folkman
In a message dated 1/24/2003 5:55:22 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

I noticed this was not under the NEWS section at www.aolserver.com ... I
didn't know if that was on purpose, or not...

 --brett

Nope - just forgot to update it there. I'll add the info there as well. Thanks!

- Nathan


Re: [AOLSERVER] DB Pool and closed connections

2003-01-24 Thread Andrew Piskorski
On Fri, Jan 24, 2003 at 11:24:52PM +, Steve Manning wrote:

 Could someone who knows about these things say what happens to a
 Postgres DB connection if a request closes prematurely i.e if the user
 hits stop or closes the browser before a query has finished is the DB
 conn returned to the pool before the query returns and can this have any
 nasty effects?

As far as I've ever known, nothing special happens, the query runs to
completion as normal.  That's certainly what I've always seen happen
with Oracle.  I am not familiar with the nsdb internal, but AFAIK the
ns_db driver and thus Postgres (or Oracle) doesn't even have any way
of KNOWING whether or not the web browser is still connected to the
web server or not.

But...  I suspect Steve M. is wondering about the second of these
threads:

  http://openacs.org/forums/message-view?message_id=74443
  http://openacs.org/forums/message-view?message_id=74451

where Dan Lieberman said:

 Having said that - there is a curious problem with PG. When the user
 disconnects by STOP or closing the browser - the PG process running
 the SELECT NEVER finishes.  Thats why i called it a zombie (even
 tho' thats an incorrect definion by unix terms) - I would have
 thought that the process WOULD complete the SELECT and drop its
 results into the bit bucket.

 This is whats really troubling me. It smells like a bug in the
 AOLServer PG driver

Now that I think of it, I'm wondering about that to.  Is the behavior
Dan's describing in fact possible?  Is there any possible interaction
between the conn and the db driver in AOLserver?

--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com



Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to...

2003-01-24 Thread Nathan Folkman
In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

Any more inputs regarding this matter will greatly be appreciated. 

Any chance you could provide a few snippets of code showing where you are locking and unlocking, and the work you are doing in between? Hard to tell what the problem is. If I had to guess, however, it sounds like you are dead locked. Perhaps you are locking, and throwing an un-caught error, and never unlocking? Or maybe you are just experiencing contention around your database which is causing other requests to back up waiting for that resource... If you can provide some more detailed information, including anythng odd you see in the server log that would be great! Also might want to check the SYSLOG for any database errors which could point to the problem.

Also, have you considered upgrading to at least AOLserver 3.4.2 or even better 3.5.1? Would need more information to know exactly what you are trying to do, but you might be able to use the nsv_incr command for your counters. 

The nsv data structure is similiar to ns_share variables in that you can share variables between multiple threads/interps. The nsv implementation is a lot cleaner, and handles all the synchronization for you. Plus, as I mentioned before, there's a nifty nsv_incr command specifically for things like counters. ns_share is not recommended, especially when running Tcl 8.x.

- Nathan