Hi,

I've got a problem with a modperl-enabled site. I've looked into the doc,
troubleshooting, mailing list archives, but to no avail, so I'm kindly
asking experts help.

Symptom:
----------------
Some users have their browser stalled forever trying to load something from
the server, without ever getting an answer back. Hitting stop and reload
gets the resource immediately.
It looks like it happens only when multiple users are requesting the same
Perl resource at the same time: one of them may not get the response.
It doesn't occur when requesting static content from Apache.
No error or warn shows up in Apache error.log file (PerlWarn is On).

Context:
--------
Server Version: Apache/2.0.54 (Win32) mod_ssl/2.0.54 OpenSSL/0.9.7g
mod_auth_sspi/1.0.1 mod_perl/1.999.21 Perl/v5.8.6 (compiled on Win32 by
myself to have it SSL-enabled)
     (problem also occurs with the official Win32 apache binary distro
2.0.54 from apache.org)
ActiveState Perl 5.8, perl -v shows: perl, v5.8.6 built for
MSWin32-x86-multi-thread (with 3 registered patches, see perl -V for more
detail), Built Dec 13 2004 09:52:01
OS is: Windows XP Pro SP1

Investigations:
---------------
To reproduce the problem, I use a basic Perl script (see below), which I
request from another host using ApacheBench with 10 concurrent agents and
requesting 100 times the script. Some of these requests (about 3 out of 100
each time) just stall and timeout after a couple (tens) of seconds (see
trace below).
Note that the longer the script execution is, the easier it is to reproduce
and the less concurrent requests are needed.

I also monitored the Apache process, and it occurs that the child Apache
process has a number of sockets in CLOSE_WAIT state, which is not normal.
The number of CLOSE_WAIT sockets increases each time the problem occurs, and
corresponds to the number of clients that stalled.

Apache mod_status shows this problem too, as it shows a non-zero number of
requests being processed while the server is idle. This number corresponds
to the number of CLOSE_WAIT sockets, plus one (for the mod_status request
itself).

Finally, I sniffed the network and saw a very strange behaviour. A first
ApacheBench run shows that client socket 1471 hangs (no response is sent
back by the server on this socket), timeout occurs and socket becomes
CLOSE_WAIT. A second run shows that Apache server sends a response back to
client socket 1471 (which is CLOSE_WAIT !) so it detects it's closed and
cleans it up, but leaves one the other client socket 1475 hanging... As if
it confused socket 1475 with old socket 1471 ! And the scenario goes over
and over again.

Any idea ?
Thanks in advance,

                              Pascal Davoust.


----Begin C:/Program Files/Apache
Group/Apache2ssl/cgi-bin/mytest.pl----------
##  printenv -- demo CGI program which just prints its environment
  ##
  use strict;
  print "Content-type: text/html\n\n";
  print "<HTML><BODY><H3>Environment variables</H3><UL>";
  foreach (sort keys %ENV) {
    my $val = $ENV{$_};
    $val =~ s|\n|\\n|g;
    $val =~ s|"|\\"|g;
    print "<LI>$_ = \"${val}\"</LI>\n";
  }
  #sleep(10);
  print "</UL></BODY></HTML>";
----End mytest.pl----------

----Begin httpd.conf excerpt-------------
PerlWarn On
<Directory "C:/Program Files/Apache Group/Apache2ssl/cgi-bin">
     SetHandler perl-script
     PerlResponseHandler ModPerl::Registry
     Options +ExecCGI
     PerlOptions +ParseHeaders
</Directory>
----End httpd.conf excerpt-------------

----Begin ApacheBench invocation-------------
> ab -c 10 -n 100 http://frcold0061086.col.bsf.alcatel.fr/cgi-bin/mytest.pl
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation,
http://www.apache.org/

Benchmarking frcold0061086.col.bsf.alcatel.fr (be patient)...apr_poll: The
timeout specified has expired (70007)
Total of 97 requests completed

>
----End ApacheBench invocation-------------

Reply via email to