>Number: 3911 >Category: os-linux >Synopsis: Under high load, server hangs in "flock or fnctl". >Confidential: no >Severity: critical >Priority: medium >Responsible: apache >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Wed Feb 17 13:40:01 PST 1999 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3.4 >Environment: Linux crappy.zko.dec.com 2.2.1 #5 Fri Feb 12 09:07:00 EST 1999 i686 unknown gcc version egcs-2.91.60 19981201 (egcs-1.1.1 release) OS version: Redhat 5.2 Intel PII/266 with FDDI interface card.
Kernel compile with the following things changed: I have changed the following kernel values in /usr/src/linux/include/net/tcp.h to: #define TCP_HTABLE_SIZE 2048 (was 512) #define TCP_LHTABLE_SIZE 128 (was 32) #define TCP_BHTABLE_SIZE 2048 (was 512) Everything is on a local filesystem. The lockfiles are NOT on NFS. >Description: I can repeatably get the server to stop responding after signifcantly stressing the system. Initially, I had apache compiled with flock serialization. After a while, a large number of the httpd processes were stuck in the following state: #0 0x400d49c1 in flock () #1 0x805aaa9 in accept_mutex_on () #2 0x805d6a5 in child_main () #3 0x805dc68 in make_child () #4 0x805dfe1 in perform_idle_server_maintenance () #5 0x805e4e9 in standalone_main () #6 0x805ea7b in main () There were a few with the following (What they SHOULD be.. ) #0 0x400de5c2 in __libc_accept () #1 0x805d7bc in child_main () #2 0x805dc68 in make_child () #3 0x805dd17 in startup_children () #4 0x805e328 in standalone_main () #5 0x805ea7b in main () When I would try to connect to the server (lynx http://127.0.0.1), it would just hang. Normally, the response would be instaneous. I tried to recompile apache with FCNTL support, and the same thing occurs. This time the stack trace is: 0 0x400d4974 in __libc_fcntl () #1 0x1 in ?? () #2 0x805d66d in child_main () #3 0x805dc30 in make_child () #4 0x805dcdf in startup_children () #5 0x805e2f0 in standalone_main () #6 0x805ea43 in main () There is some kind of race condition that occurs under a very heavy load. I am not sure if it is a linux, apache, or even glibc bug, but I really want to get a good result here. >How-To-Repeat: The load is SPECWeb96. When I try to push my system above 60 Ops/Sec, this occurs. I don't have an easy way for an external site to repeat it, but for the next week and a half, it is all I will be working on. So, I can easily try out any patches that anyone may have. >Fix: None. >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, ] [you need to include <[EMAIL PROTECTED]> in the Cc line ] [and leave the subject line UNCHANGED. This is not done] [automatically because of the potential for mail loops. ] [If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request ] [from a developer. ] [Reply only with text; DO NOT SEND ATTACHMENTS! ]