>Number: 5485 >Category: mod_jserv >Synopsis: Servlets stop responding after working correctly for an >extended period >Confidential: no >Severity: critical >Priority: medium >Responsible: jserv >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Thu Dec 16 14:10:01 PST 1999 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: apache 1.3.6 and jserv 1.0 >Environment: SunOS dat-chi 5.6 Generic_105181-13 sun4u sparc SUNW,Ultra-4 JDBC access to an oracle listener Sun javac compiler >Description: We are using apache and jserv as an oracle database front end for a web-based application. There are approximately 40 servlets which work in conjunction to produce the correct HTML displays and control database activities.
The web server and jserv run for extended periods with no problems and then the servlets stop responding. The web server is still running. Each servlet logs its transactions and status to log files but there are no problem indications in those files. The /usr/local/apache/jserv/logs files give no problem indications either. The /usr/local/apache/logs/access_log shows this access in progress when the servlets stopped responding [15/Dec/1999:22:29:49 -0600] "GET /mdex/DirectorySelectForAdd?. . . . . . (deleted this user's personal info) By checking a different servlet's logfile which has activity about once a second I was able to see that that servlet stopped responding at the same time. The next lines in the access_log shows this line repeated 5 more times at 5 minute 2 seconds intervals. I'm guessing this is coming from the web server itself since our user applications do not automatically send retries. The /usr/local/apache/logs/error_log file does not show an entry with a timestamp at the same time that the servlets stop responding but there are the following lines between the last entry and when the web server was restarted: thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0740(5367200) failed: 3 = ESRCH. thr_continue of 0xeabc0748(1124094120) failed: 3 = ESRCH. thr_continue of 0xeabc07f8(16) failed: 3 = ESRCH. thr_continue of 0xeabc06d0(-280320148) failed: 3 = ESRCH. thr_continue of 0xeabc06e0(0) failed: 3 = ESRCH. The servlets did not respond until the problem was discovered about 12 hours later. The web server was restarted by "apachectl graceful" and the servlets started working again. >How-To-Repeat: This problem occurs randomly as far as we can tell. We have not been able to cause it to happen. Since the web server is being restarted frequently as we make changes to the servlets (the product is still in beta) it is not clear if it is related to the amount of time or number of transactions since start time. This has occured about once a week. Our application is a 7x24 used by major cell phone carriers and even short outages are not permitted. >Fix: Don't have the slightest, but it has the feel of a breakdown in communications between the web server and jserv or the jserv gets locked up on an internal error. >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, you need] [to include <[EMAIL PROTECTED]> in the Cc line and make sure the] [subject line starts with the report component and number, with ] [or without any 'Re:' prefixes (such as "general/1098:" or ] ["Re: general/1098:"). If the subject doesn't match this ] [pattern, your message will be misfiled and ignored. The ] ["apbugs" address is not added to the Cc line of messages from ] [the database automatically because of the potential for mail ] [loops. If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request from a ] [developer. Reply only with text; DO NOT SEND ATTACHMENTS! ]