Re: [AOLSERVER] Handling threads the right way
I've been poking around with the aolserver gdb traces I've done in the past and it looks like there's a high turnaround of threads turning into zombie threads. Personally, I think this is much inline with either bad tcl code or a configuration issue. I'm going to poke around and I have yet to go through the links Andrew provided but thought I'd throw this update out there. On Jul 1, 11:13 am, Sep Ng thejackschm...@gmail.com wrote: On Jul 1, 10:42 am, Andrew Piskorski a...@piskorski.com wrote: On Wed, Jun 30, 2010 at 05:21:40PM -0400, Andrew Piskorski wrote: If you manage to find a list somewhere of what MS Windows library calls are or are not thread-safe, then you could use various tools to find ALL the calls in your AOLserver binaries, and compare the two lists to see if AOLserver seems to be calling anything unsafe. Hm, I thought you were running AOLserver on MS Windows (which is possible but certainly unusual), but later you mention using ulimit, so in hindsight my assumption was almost certainly incorrect. Are you using some flavor of Linux like most people? I'm handling quite a few differnet flavours and it's kind of maddening. I've got a Lenny, an Etch and a few old RHEL ones as well. I brought up Windows because I read herehttp://aolserver.am.net/docs/tuning.adpx that there were instability issues with Windows and AOLserver with certain settings. And I was wondering if the same problems extend to other OSes like Linux. As for lists of thread-unsafe functions for various OSs, it seems some progress has been made since I last looked into it c. 2002 or 3. Some brief googling suggests: http://blog.josefsson.org/2009/06/23/thread-safe-functions/ http://etbe.coker.com.au/2009/06/14/finding-thread-unsafe-code/ http://developers.sun.com/solaris/articles/multithreaded.html http://www.devx.com/cplus/Article/4/1763/page/3 http://valgrind.org/info/tools.html#helgrind Those three guys' various lists of functions are of course unlikely to be conclusive. But it's a lot better than nothing. That's a great bunch of links and I will pour over them for sure! Personally, I think it's pretty crazy that shared libraries shipped with OSs don't provide some sort of simple list noting the thread-safety status of every single public function they provide. That sounds like common sense to me. Would have been nice. -- Andrew Piskorski a...@piskorski.comhttp://www.piskorski.com/ -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Tue, Jun 29, 2010 at 03:23:38PM -0700, Sep Ng wrote: Basically we had aolservers running and while serving pages, it's also doing some heavy load processing from a ton of scheduled custom written procedures. Scheduled using AOLserver built-in scheduler, ns_schedule_proc, ns_schedule_daily, or the like? Aolserver crashes and segmentation faults are fairly frequent and the logs at the time pointed to these running threads as a probable cause. Then the first place to look is in your custom code, it's the most likely place for the bug. Is your scheduled code purely Tcl or does it use any C code? If you turn off your scheduled procs, does the crashing go away? This is a debugging problem, you need to find the bug before you decide how to fix it. After the crash look at the core file's stack trace in a debugger and see if that gives you any clues. Can you reproduce the problem by hitting your development AOLserver with a particular load-testing script? If the problem is non-obvious, you'll probably need that to track it down. Your focus on AOLserver's thread creation and scheduling mechanisms seems misplaced. You're speculating about ways to fix some imagined problem, but you don't know yet whether your actual problem has any similarity at all to your speculations. So basically, what I'm currently beating my head over is to build a much cleaner and better way of handling all the load It's not clear that building any such thing will help you. If the crash-inducing bugs are in your custom scheduled code, it's fairly likely that they're still going to crash no matter what thread you run them in or how you go about scheduling those threads. If after lots of looking you REALLY can't find the crash-causing bug(s), THEN I'd start thinking about ways to live with and ameliorate the problem. The simplest one of course, which you've probably already done, is to just let your AOLserver crash and make sure that it's always able to come back up quickly and pick up as close to where it left off as possible. Better, is to isolate your custom scheduled code in an entirely separate process, with communication between your AOLserver and that helper process. AOLserver 4.5 definitely includes a mechanism for doing that, but I forget what it's called. That way, your code may well still crash, but it will only take down the helper process rather than your entire AOLserver. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Tue, Jun 29, 2010 at 06:19:06PM -0700, Sep Ng wrote: How can I tell which ones are thread safe? This sounds like something I will need to look into before I start writing code. *All* AOLserver modules must be thread safe. If they have any parts that are not, then that's clearly a bug which their developer overlooked, and needs to be fixed. Keep in mind however, that it's entirely possible for a module to call some external C library which happens to be completely thread-safe on one operating system, but horribly unsafe another. All part of the fun of cross-platform programming... If you manage to find a list somewhere of what MS Windows library calls are or are not thread-safe, then you could use various tools to find ALL the calls in your AOLserver binaries, and compare the two lists to see if AOLserver seems to be calling anything unsafe. Unfortunately very few operating systems provide any such clear, consolidated documentation of what function calls are thread-safe vs. not. You're lucky if the docs on each individual function call even tell you, and of course as Gustaf mentioned, occasionally those docs are wrong. My general impression though, is that historically MS Windows has tended to have a lot FEWER non-thread-safe library calls in use than Unix. This is probably because Win32 was first written in an era when threads were very popular, while most versions of Unix have roots stretching back well before then. (Supposedly that's also why the multi-process support in Win32 has always been said to be lousy, but the success of multi-process Google Chrome on Win32 suggest that those problems have either been fixed, or can be effectively worked around.) I prefer libthread, since all such threads run in an event loop. I don't think I've ever heard of this on Aolserver... I always thought Aolserver's threads would eventually end up using the tcl+libthread but it seems that there's a real difference in this. The Tcl Threads Extension's libthread was written AFTER AOLserver; in fact AOLserver is what inspired Zoran to write libthread in the first place. Generally speaking, libthread is backwards compatible with AOLserver, but also includes some newer stuff that AOLserver does not. libthread is designed so that you can easily use it from inside AOLserver, including as a drop-in replacement for AOLserver's older nsv_* code. It should be technically feasible to change AOLserver to use libthread directly, but no one has done the work. (And anyway, none of that has much of anything to do with your debugging problem.) So typically the config file has no connection whatsoever to the threads of aolserver and that it only pertains to the connection threads, or am I confusing this even further? Your the threads of aolserver terminology above is certainly confused. A connection thread is one of the various flavors of threads used in AOLserver. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Jul 1, 4:59 am, Andrew Piskorski a...@piskorski.com wrote: On Tue, Jun 29, 2010 at 03:23:38PM -0700, Sep Ng wrote: Basically we had aolservers running and while serving pages, it's also doing some heavy load processing from a ton of scheduled custom written procedures. Scheduled using AOLserver built-in scheduler, ns_schedule_proc, ns_schedule_daily, or the like? Yeah. A lot of ns_schedule_* calls. We also have a thread that retrieves jobs from the database and spawns more threads to execute those jobs. Aolserver crashes and segmentation faults are fairly frequent and the logs at the time pointed to these running threads as a probable cause. Then the first place to look is in your custom code, it's the most likely place for the bug. Is your scheduled code purely Tcl or does it use any C code? If you turn off your scheduled procs, does the crashing go away? It's purely TCL code. When we did turn off the scheduled procs, aolserver would definitely not crash but with the scheduled tasks not running, aolserver does not seem to be in any burden at all and serving pages are pretty fast. This is a debugging problem, you need to find the bug before you decide how to fix it. After the crash look at the core file's stack trace in a debugger and see if that gives you any clues. Can you reproduce the problem by hitting your development AOLserver with a particular load-testing script? If the problem is non-obvious, you'll probably need that to track it down. I suppose this is as you describe it. I've been meaning to set the servers to create the core dump flie, but it never seemed to. That's a separate issue altogether and that even though ulimit reports the core file size as unlimited, it doesn't seem to be creating it. Your focus on AOLserver's thread creation and scheduling mechanisms seems misplaced. You're speculating about ways to fix some imagined problem, but you don't know yet whether your actual problem has any similarity at all to your speculations. So basically, what I'm currently beating my head over is to build a much cleaner and better way of handling all the load It's not clear that building any such thing will help you. If the crash-inducing bugs are in your custom scheduled code, it's fairly likely that they're still going to crash no matter what thread you run them in or how you go about scheduling those threads. That's a good point. If after lots of looking you REALLY can't find the crash-causing bug(s), THEN I'd start thinking about ways to live with and ameliorate the problem. The simplest one of course, which you've probably already done, is to just let your AOLserver crash and make sure that it's always able to come back up quickly and pick up as close to where it left off as possible. Yeah, we've been surviving on that for a while now. Better, is to isolate your custom scheduled code in an entirely separate process, with communication between your AOLserver and that helper process. AOLserver 4.5 definitely includes a mechanism for doing that, but I forget what it's called. That way, your code may well still crash, but it will only take down the helper process rather than your entire AOLserver. -- Andrew Piskorski a...@piskorski.comhttp://www.piskorski.com/ -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Jul 1, 5:21 am, Andrew Piskorski a...@piskorski.com wrote: On Tue, Jun 29, 2010 at 06:19:06PM -0700, Sep Ng wrote: How can I tell which ones are thread safe? This sounds like something I will need to look into before I start writing code. *All* AOLserver modules must be thread safe. If they have any parts that are not, then that's clearly a bug which their developer overlooked, and needs to be fixed. Keep in mind however, that it's entirely possible for a module to call some external C library which happens to be completely thread-safe on one operating system, but horribly unsafe another. All part of the fun of cross-platform programming... This likely means that whatever issues I'm hitting will still happen even with a rewrite. Alright, I will try and take a closer look at all of this. If you manage to find a list somewhere of what MS Windows library calls are or are not thread-safe, then you could use various tools to find ALL the calls in your AOLserver binaries, and compare the two lists to see if AOLserver seems to be calling anything unsafe. Unfortunately very few operating systems provide any such clear, consolidated documentation of what function calls are thread-safe vs. not. You're lucky if the docs on each individual function call even tell you, and of course as Gustaf mentioned, occasionally those docs are wrong. My general impression though, is that historically MS Windows has tended to have a lot FEWER non-thread-safe library calls in use than Unix. This is probably because Win32 was first written in an era when threads were very popular, while most versions of Unix have roots stretching back well before then. (Supposedly that's also why the multi-process support in Win32 has always been said to be lousy, but the success of multi-process Google Chrome on Win32 suggest that those problems have either been fixed, or can be effectively worked around.) I prefer libthread, since all such threads run in an event loop. I don't think I've ever heard of this on Aolserver... I always thought Aolserver's threads would eventually end up using the tcl+libthread but it seems that there's a real difference in this. The Tcl Threads Extension's libthread was written AFTER AOLserver; in fact AOLserver is what inspired Zoran to write libthread in the first place. Generally speaking, libthread is backwards compatible with AOLserver, but also includes some newer stuff that AOLserver does not. libthread is designed so that you can easily use it from inside AOLserver, including as a drop-in replacement for AOLserver's older nsv_* code. It should be technically feasible to change AOLserver to use libthread directly, but no one has done the work. (And anyway, none of that has much of anything to do with your debugging problem.) So typically the config file has no connection whatsoever to the threads of aolserver and that it only pertains to the connection threads, or am I confusing this even further? Your the threads of aolserver terminology above is certainly confused. A connection thread is one of the various flavors of threads used in AOLserver. Right. Thanks for that clarification. You guys have been tremendously helpful in giving me such immensely informative posts so this is all well appreciated. -- Andrew Piskorski a...@piskorski.comhttp://www.piskorski.com/ -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Wed, Jun 30, 2010 at 05:21:40PM -0400, Andrew Piskorski wrote: If you manage to find a list somewhere of what MS Windows library calls are or are not thread-safe, then you could use various tools to find ALL the calls in your AOLserver binaries, and compare the two lists to see if AOLserver seems to be calling anything unsafe. Hm, I thought you were running AOLserver on MS Windows (which is possible but certainly unusual), but later you mention using ulimit, so in hindsight my assumption was almost certainly incorrect. Are you using some flavor of Linux like most people? As for lists of thread-unsafe functions for various OSs, it seems some progress has been made since I last looked into it c. 2002 or 3. Some brief googling suggests: http://blog.josefsson.org/2009/06/23/thread-safe-functions/ http://etbe.coker.com.au/2009/06/14/finding-thread-unsafe-code/ http://developers.sun.com/solaris/articles/multithreaded.html http://www.devx.com/cplus/Article/4/1763/page/3 http://valgrind.org/info/tools.html#helgrind Those three guys' various lists of functions are of course unlikely to be conclusive. But it's a lot better than nothing. Personally, I think it's pretty crazy that shared libraries shipped with OSs don't provide some sort of simple list noting the thread-safety status of every single public function they provide. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Jul 1, 10:42 am, Andrew Piskorski a...@piskorski.com wrote: On Wed, Jun 30, 2010 at 05:21:40PM -0400, Andrew Piskorski wrote: If you manage to find a list somewhere of what MS Windows library calls are or are not thread-safe, then you could use various tools to find ALL the calls in your AOLserver binaries, and compare the two lists to see if AOLserver seems to be calling anything unsafe. Hm, I thought you were running AOLserver on MS Windows (which is possible but certainly unusual), but later you mention using ulimit, so in hindsight my assumption was almost certainly incorrect. Are you using some flavor of Linux like most people? I'm handling quite a few differnet flavours and it's kind of maddening. I've got a Lenny, an Etch and a few old RHEL ones as well. I brought up Windows because I read here http://aolserver.am.net/docs/tuning.adpx that there were instability issues with Windows and AOLserver with certain settings. And I was wondering if the same problems extend to other OSes like Linux. As for lists of thread-unsafe functions for various OSs, it seems some progress has been made since I last looked into it c. 2002 or 3. Some brief googling suggests: http://blog.josefsson.org/2009/06/23/thread-safe-functions/ http://etbe.coker.com.au/2009/06/14/finding-thread-unsafe-code/ http://developers.sun.com/solaris/articles/multithreaded.html http://www.devx.com/cplus/Article/4/1763/page/3 http://valgrind.org/info/tools.html#helgrind Those three guys' various lists of functions are of course unlikely to be conclusive. But it's a lot better than nothing. That's a great bunch of links and I will pour over them for sure! Personally, I think it's pretty crazy that shared libraries shipped with OSs don't provide some sort of simple list noting the thread-safety status of every single public function they provide. That sounds like common sense to me. Would have been nice. -- Andrew Piskorski a...@piskorski.comhttp://www.piskorski.com/ -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
[AOLSERVER] Handling threads the right way
Hi, I've been looking into building some thread intensive applications on top of aolserver (both on 4.0.10 and 4.5.1) and from experience, it seems that this maybe one of the easier points to get wrong and crash aolserver. I've wanted looked for documentation before but there does not seem to be any clear answers so I'm hoping someone can let me know of a couple of things. 1. Is it advisable for aolserver to run a detached ns thread that is supposed to run for the duration of the server? I was wondering about memory consumption and whether it's better to let the thread perform the task and die after which the server spawns another thread for a separate execution. 2. I read that in Windows, thread destruction can cause instability and possible memory leaks. Does this extend to other OS platforms? 3. I understand the general idea behind spawning threads with aolserver, but ideally, I'd like to avoid the taboos on them, so any idea about this is well-appreciated. Thanks for reading this and hope to hear back. Regards. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Mon, Jun 28, 2010 at 11:25:03PM -0700, Sep Ng wrote: I've been looking into building some thread intensive applications on top of aolserver (both on 4.0.10 and 4.5.1) and from experience, it Interesting. Perhaps tells us more about your applications, what led you here, etc. seems that this maybe one of the easier points to get wrong and crash aolserver. One of the great things about AOLserver is that it, in conjunction with Tcl, has a set of APIs that are MUCH better for doing practical multi-threaded programming than plain POSIX threads. This applies both when programming in Tcl and in C. So basically, I'm not sure what you mean. Anytime you write C code you certainly can 'do something wrong and crash the server', but that's not particularly difficult to avoid or fix. If for some reason that's still a severe concern, AOLserver, and especially C, probably aren't the right tools. (It is of course much harder to accidentally crash AOLserver from Tcl.) 1. Is it advisable for aolserver to run a detached ns thread that is supposed to run for the duration of the server? I don't see any reason why not. I've done it with ns_thread begindetached, which worked just fine on both Solaris and Windows XP. I was wondering about memory consumption and whether it's better to let the thread perform the task and die after which the server spawns another thread for a separate execution. That should of course work, but it's the exact opposite of standard performance-tuning advice for AOLserver. Every Tcl interpreter in AOLserver has its own entire copy of all Tcl library code. For that and probably (I forget exactly) other reasons, most (but not all) AOLserver threads tend to be pretty heavyweight, significantly more so than the underlying OS threads. So the standard advice is to reuse threads where feasible, and AOLserver is in fact specifically designed to automatically reuse connection threads. Rapidly creating and destroying many fat threads has a lot of overhead and will tend to give you lousy performance. 2. I read that in Windows, thread destruction can cause instability and possible memory leaks. Does this extend to other OS platforms? No. ;) FWIW, in my long-term but low-load use of AOLserver on Windows, I never had any particular instability problems. Given my code and how I had AOLserver's thread settings configured, there was probably very little thread creation and destruction going on, but that was essentially coincidental, I didn't design for it. 3. I understand the general idea behind spawning threads with aolserver, but ideally, I'd like to avoid the taboos on them, so any idea about this is well-appreciated. Huh? What are you trying to ask here, and why? Spawning a new thread in AOLserver is easy, and I've never heard of taboos related to them. -- Andrew Piskorski a...@piskorski.com http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On 28/06/2010 11:25 PM, Sep Ng wrote: 2. I read that in Windows, thread destruction can cause instability and possible memory leaks. Does this extend to other OS platforms? Just to highlight this point - this is partially true. For some versions of msvcrt, the stock, documented thread calls actually would end up leaking memory. This is why Tcl does not use those, resorting to lower-level calls. This odd quirk of Windows msvcrt has become common knowledge over time. If you are not using Tcl threads (which layer over native threads), make sure you read up on _beginthreadex/_endthreadex. Jeff -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
Thanks for the input. Basically we had aolservers running and while serving pages, it's also doing some heavy load processing from a ton of scheduled custom written procedures. Aolserver crashes and segmentation faults are fairly frequent and the logs at the time pointed to these running threads as a probable cause. Possibly a configuration issue, but I remember we tried fiddling with those numbers and it never really helped. So basically, what I'm currently beating my head over is to build a much cleaner and better way of handling all the load but in so doing, I'm not entirely sure whether or not to spawn a lot of threads for the jobs, or basically keep it to a minimum. Judging from Andrew's post, it would likely be better to reuse threads but I'm not entirely sure how that happens. I mean, everytime you'd invoke ns_thread begin/begindetached, you are creating a new thread already, no? How do you reuse them? This is probably a stupid question, but is there a distinction between the threads and connection threads in aolserver? I know the connection threads are probably the connections to the database (I think). Thank you all for your insights and hope to hear more! On Jun 30, 12:59 am, Jeff Hobbs je...@activestate.com wrote: On 28/06/2010 11:25 PM, Sep Ng wrote: 2. I read that in Windows, thread destruction can cause instability and possible memory leaks. Does this extend to other OS platforms? Just to highlight this point - this is partially true. For some versions of msvcrt, the stock, documented thread calls actually would end up leaking memory. This is why Tcl does not use those, resorting to lower-level calls. This odd quirk of Windows msvcrt has become common knowledge over time. If you are not using Tcl threads (which layer over native threads), make sure you read up on _beginthreadex/_endthreadex. Jeff -- AOLserver -http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
Am 30.06.10 00:23, schrieb Sep Ng: Basically we had aolservers running and while serving pages, it's also doing some heavy load processing from a ton of scheduled custom written procedures. Aolserver crashes and segmentation faults are fairly frequent and the logs at the time pointed to these running threads as a probable cause. Possibly a configuration issue, but I remember we tried fiddling with those numbers and it never really helped. From my experience, the two most common causes for crashes in aolserver are a) c-based extensions (tcl extensions and/or aolserver modules) which are not thread safe (e.g. using non thread-safe libraries) or which have bugs (e.g. double frees of memory), or b) running out of memory. The aolserver threads are typically heavy-weight, the zippy memory allocator over-allocates memory and does not free it. All threads use the same memory block, which is limited on most 32bit machines to 2GB. When you hit the limit, the server crashes. A solution for (b) is to compile everything (tcl, aolserver, modules) with 64 bit. If you are using just Tcl + Aolserver commands, the server should never crash with recent versions of aolserver and tcl (e.g. tcl 8.4.19, 8.5.8). What C extensions or aolserver modules are you using? The following heuristic might help in the search of the problem. If your crashes are random and independent of the load (e.g. happen with already a little load) then bugs in the C code of the extensions are likely. If crashes happens during thread end, bugs in the memory management of the c-extensions are likely (tcl deletes an interpreter with all associated memory). If the crashes happen under heavy load, thread-safety of some component is likely to be the problem (e.g. we had a long time problems with our aolserver, which ran stable up to about 600 concurrent users, then it started to crash. One of the problem was the kerberos library, which was not reentrant, although it claimed to be so). So basically, what I'm currently beating my head over is to build a much cleaner and better way of handling all the load but in so doing, I'm not entirely sure whether or not to spawn a lot of threads for the jobs, or basically keep it to a minimum. Judging from Andrew's post, it would likely be better to reuse threads but I'm not entirely sure how that happens. I mean, everytime you'd invoke ns_thread begin/begindetached, you are creating a new thread already, no? How do you reuse them? In general, you have two options: you can use aolserver-threads and threads created by the tcl thread library (libthread), which can be loaded as a module into the aolserver (when compiled with the appropriate flags). I prefer libthread, since all such threads run in an event loop. This makes it very easy to send to these threads commands which are implemented via event-loops. One can as well to implement this way easy communication between the threads. It is essentially the same programming model that can be used by using tcl+libthread outside of the aolserver. This is probably a stupid question, but is there a distinction between the threads and connection threads in aolserver? I know the connection threads are probably the connections to the database (I think). no, the connection threads handle incoming HTTP requests, they have a connection to the web-client. These are the threads controlled in the config file via e.g, maxthreads. Typically, a connection thread is created on demand (incoming request) and lives, until either it times out (it received no requests for a certain time) or it has served a maximum number of requests (all configurable in the config file). When it reaches the end of its live-cycle, it is terminated (and maybe recreated afterwards by new demand). The threads of aolserver are typically heavyweight, since every thread contains a full tcl interpreter. The weight depends on the size of the blueprint, which is determined essentially by all tcl-procs available at runtime. If you are using e.g. OpenACS with many OpenACS packages defined, these might be several thousand procs. When a thread starts, it reads the blue-print with all procs, which might take a while. If you are using just Aolserver with a few tcl-modules, the cost of starting a new thread (essentially loading the blueprint) might not be so bad, and you might start for every jobs a new thread. But this certainly depends on your needs and the kind of performance you might want to reach. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to lists...@listserv.aol.com with the body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Handling threads the right way
On Jun 30, 8:27 am, Gustaf Neumann neum...@wu-wien.ac.at wrote: Am 30.06.10 00:23, schrieb Sep Ng: Basically we had aolservers running and while serving pages, it's also doing some heavy load processing from a ton of scheduled custom written procedures. Aolserver crashes and segmentation faults are fairly frequent and the logs at the time pointed to these running threads as a probable cause. Possibly a configuration issue, but I remember we tried fiddling with those numbers and it never really helped. From my experience, the two most common causes for crashes in aolserver are a) c-based extensions (tcl extensions and/or aolserver modules) which are not thread safe (e.g. using non thread-safe libraries) or which have bugs (e.g. double frees of memory), or b) running out of memory. The aolserver threads are typically heavy-weight, the zippy memory allocator over-allocates memory and does not free it. All threads use the same memory block, which is limited on most 32bit machines to 2GB. When you hit the limit, the server crashes. A solution for (b) is to compile everything (tcl, aolserver, modules) with 64 bit. That's a good point, but I'm not entirely sure that aolserver ever really hit the 2GB limit. I guess I couldn't really speculate on it. If you are using just Tcl + Aolserver commands, the server should never crash with recent versions of aolserver and tcl (e.g. tcl 8.4.19, 8.5.8). What C extensions or aolserver modules are you using? I think we tend to use a lot of aolserver modules. And I would suppose a little haphazardly too. These are the extra ones that I remember. nsoracle nscache nsclamav nsaspell (not sure) nssha1 (not sure) nszlib nsopenssl (not sure) and nsreturnz on 4.0.10 - basically this was a port I found sometime back but I know we're removing it when we fully move to 4.5.1. How can I tell which ones are thread safe? This sounds like something I will need to look into before I start writing code. The following heuristic might help in the search of the problem. If your crashes are random and independent of the load (e.g. happen with already a little load) then bugs in the C code of the extensions are likely. If crashes happens during thread end, bugs in the memory management of the c-extensions are likely (tcl deletes an interpreter with all associated memory). If the crashes happen under heavy load, thread-safety of some component is likely to be the problem (e.g. we had a long time problems with our aolserver, which ran stable up to about 600 concurrent users, then it started to crash. One of the problem was the kerberos library, which was not reentrant, although it claimed to be so). So basically, what I'm currently beating my head over is to build a much cleaner and better way of handling all the load but in so doing, I'm not entirely sure whether or not to spawn a lot of threads for the jobs, or basically keep it to a minimum. Judging from Andrew's post, it would likely be better to reuse threads but I'm not entirely sure how that happens. I mean, everytime you'd invoke ns_thread begin/begindetached, you are creating a new thread already, no? How do you reuse them? In general, you have two options: you can use aolserver-threads and threads created by the tcl thread library (libthread), which can be loaded as a module into the aolserver (when compiled with the appropriate flags). I prefer libthread, since all such threads run in an event loop. This makes it very easy to send to these threads commands which are implemented via event-loops. One can as well to implement this way easy communication between the threads. It is essentially the same programming model that can be used by using tcl+libthread outside of the aolserver. I don't think I've ever heard of this on Aolserver... I always thought Aolserver's threads would eventually end up using the tcl+libthread but it seems that there's a real difference in this. This is probably a stupid question, but is there a distinction between the threads and connection threads in aolserver? I know the connection threads are probably the connections to the database (I think). no, the connection threads handle incoming HTTP requests, they have a connection to the web-client. These are the threads controlled in the config file via e.g, maxthreads. Typically, a connection thread is created on demand (incoming request) and lives, until either it times out (it received no requests for a certain time) or it has served a maximum number of requests (all configurable in the config file). When it reaches the end of its live-cycle, it is terminated (and maybe recreated afterwards by new demand). The threads of aolserver are typically heavyweight, since every thread contains a full tcl interpreter. The weight depends on the size of the blueprint, which is determined essentially by all tcl-procs available