Re: mystery solved... perhaps.

2004-01-27 Thread Joe Orton
On Mon, Jan 26, 2004 at 04:55:34PM -0600, Ben Collins-Sussman wrote:
> On Fri, 2004-01-23 at 08:07, Joe Orton wrote:
> 
> > Nice, this is easy enough to reproduce.  It only fills up because the
> > httpd children all have the read end of the pipe open, which is a bug in
> > itself.  Applying below ensures that the pipe gets closed when the piped
> > logger exits, and so writes() fail with ETERM rather than blocking up in
> > the leftover children.
> 
> Yup, it fixes the immediate hanging-httpd problem, but then the httpd
> child is left unable to log *anything* after the 'graceful' for the
> remainder of the svn commit.  If the 'graceful' happens early on, you
> could potentially lose all logging for most of the commit.  It's not a
> real fix.

Yeah, that naive patch also broke reviving a piped logger which dies
unexpectedly, and probably more.  Please do file a bug on this issue.

joe


Re: mystery solved... perhaps.

2004-01-26 Thread Ben Collins-Sussman
On Fri, 2004-01-23 at 08:07, Joe Orton wrote:

> Nice, this is easy enough to reproduce.  It only fills up because the
> httpd children all have the read end of the pipe open, which is a bug in
> itself.  Applying below ensures that the pipe gets closed when the piped
> logger exits, and so writes() fail with ETERM rather than blocking up in
> the leftover children.

Yup, it fixes the immediate hanging-httpd problem, but then the httpd
child is left unable to log *anything* after the 'graceful' for the
remainder of the svn commit.  If the 'graceful' happens early on, you
could potentially lose all logging for most of the commit.  It's not a
real fix.

> To fix this properly, I suppose piped loggers should not get SIGTERMed
> during a graceful restart, they should read till EOF then exit(0): then
> when the last child attached to the piped logger for a particular
> generation quits, the pipe is closed and the piped logger terminates
> gracefully too, without losing log messages.

Yah, that sounds nice.

If the apache developers are OK, I'd like to file an issue about this in
the public httpd issuetracker.  What we've really got here is a bug in
the piped logger code;  it just can't deal with long-lived httpd
children.  

Anyone mind if I file an issue?  Anyone using Subversion (mod_dav_svn)
with 'rotatelogs' is likely to be burned by this problem.  I'd like to
be able to point them to the issue, at least.







Re: mystery solved... perhaps.

2004-01-23 Thread Joe Orton
On Thu, Jan 22, 2004 at 05:02:50PM -0600, Ben Collins-Sussman wrote:
...
>   * for some reason, the 'rotatelogs' process dies.  It's not clear
> whether it's responding to a signal, or if the httpd parent is
> killing it, or what.  A new 'rotatelogs' takes its place, with new
> httpd children connecting to it.  Meanwhile, the "old" httpd child
> continues to service svn, and continues to write logdata to a dead
> pipe... there's nobody reading data from the pipe on the other end
> anymore!

It is SIGTERMed by the parent on a restart, yes.

>   * Eventually the pipe fills up, and the httpd child just hangs
> trying to write to it.

Nice, this is easy enough to reproduce.  It only fills up because the
httpd children all have the read end of the pipe open, which is a bug in
itself.  Applying below ensures that the pipe gets closed when the piped
logger exits, and so writes() fail with ETERM rather than blocking up in
the leftover children.

To fix this properly, I suppose piped loggers should not get SIGTERMed
during a graceful restart, they should read till EOF then exit(0): then
when the last child attached to the piped logger for a particular
generation quits, the pipe is closed and the piped logger terminates
gracefully too, without losing log messages.

Index: server/log.c
===
RCS file: /home/cvs/httpd-2.0/server/log.c,v
retrieving revision 1.140
diff -u -r1.140 log.c
--- server/log.c12 Jan 2004 22:48:08 -  1.140
+++ server/log.c23 Jan 2004 14:02:38 -
@@ -865,8 +865,6 @@
 static apr_status_t piped_log_cleanup_for_exec(void *data)
 {
 piped_log *pl = data;
-
-apr_file_close(ap_piped_log_read_fd(pl));
 apr_file_close(ap_piped_log_write_fd(pl));
 return APR_SUCCESS;
 }
@@ -905,6 +903,7 @@
 errno = save_errno;
 return NULL;
 }
+apr_file_close(ap_piped_log_read_fd(pl));
 return pl;
 }
 


RE: mystery solved... perhaps.

2004-01-22 Thread Paul White
A suggestion:

I'm not very sure what platform your on, but I'm assuming it's a unix
type of platform.  You can use 'netstat -ap' to find the socket (by
searching for its PID) that the httpd child is using to write its log
data, and see if its full.  The output will give you two counters: The
current send buffer size, and the current receive buffer size.  If your
send buffer size is maxed out, then that's definitely the issue.

Hope this helps...
-Pw

-Original Message-
From: Ben Collins-Sussman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 22, 2004 3:03 PM
To: [EMAIL PROTECTED]
Subject: mystery solved... perhaps.

After a couple days of research (and talking with Ed Korthof), I think
the mysterious bug is solved.  Recall the reproduction recipe:

  * start a long running 'svn import'
  * run 'apachectl graceful'
  * a few seconds later, httpd just hangs.

It looks like the main bug is with the 'rotatelogs' process that comes
with apache.  It has nothing to do with SSL at all -- it's
reproducible over HTTP, and even without subversion.

The theory (as I understand it) is this: if you set up your httpd.conf
appropriately, the httpd parent process launches a 'rotatelogs' child
process, along with N other httpd child processes.  All of the httpd
children keep write-pipes open to the rotatelogs process, and write log
data into their pipes.  'rotatelogs' has the job of reading these pipes
and spewing the data into appropriate files, creating new logfiles when
necessary.

Here's what Ed Korthof thinks is happening:

  * the svn client (using neon) opens a long-lived connection to do a
commit.  Using 'keepalive', it sends a huge number of PUT and
PROPPATCH request over one connection to a single httpd child.

  * when the 'graceful' signal hits, httpd children wait for their
current connection to close, then exit.  Meanwhile, the httpd
parent spawns a new "generation" of httpd children.  Obviously,
the httpd child servicing the svn commit sticks around a very long
time, because svn doesn't hang up until it's done sending
everything.

  * for some reason, the 'rotatelogs' process dies.  It's not clear
whether it's responding to a signal, or if the httpd parent is
killing it, or what.  A new 'rotatelogs' takes its place, with new
httpd children connecting to it.  Meanwhile, the "old" httpd child
continues to service svn, and continues to write logdata to a dead
pipe... there's nobody reading data from the pipe on the other end
anymore!

  * Eventually the pipe fills up, and the httpd child just hangs
trying to write to it.


I think this theory is true, for a few reasons:

  * every time I run 'strace -p PID' on the frozen httpd, it claims to
writing logdata.   gdb confirms this as well.

  * edk is able to reproduce the problem without subversion, simply by
hand-typing HTTP requests chained together by a Keep-Alive header.

  * 'svn import' claims to have received 'success' repsonses on about
20 more files beyond what accesslog shows, implying a pipe-backup.

  * The clincher: in all my testing on different platforms (7 or 8
different setups) this bug is reproducible *every* time httpd.conf
is using 'rotatelogs', and the bug vanishes when I stop using
'rotatelogs'.


Final analysis: 

This looks like some kind of bug in Apache itself, not
related to SSL or Subversion at all... it looks like a bug in the
interaction between 'rotatelogs' process and clients that use
Keep-Alive.

Any comments or thoughts?




mystery solved... perhaps.

2004-01-22 Thread Ben Collins-Sussman
After a couple days of research (and talking with Ed Korthof), I think
the mysterious bug is solved.  Recall the reproduction recipe:

  * start a long running 'svn import'
  * run 'apachectl graceful'
  * a few seconds later, httpd just hangs.

It looks like the main bug is with the 'rotatelogs' process that comes
with apache.  It has nothing to do with SSL at all -- it's
reproducible over HTTP, and even without subversion.

The theory (as I understand it) is this: if you set up your httpd.conf
appropriately, the httpd parent process launches a 'rotatelogs' child
process, along with N other httpd child processes.  All of the httpd
children keep write-pipes open to the rotatelogs process, and write log
data into their pipes.  'rotatelogs' has the job of reading these pipes
and spewing the data into appropriate files, creating new logfiles when
necessary.

Here's what Ed Korthof thinks is happening:

  * the svn client (using neon) opens a long-lived connection to do a
commit.  Using 'keepalive', it sends a huge number of PUT and
PROPPATCH request over one connection to a single httpd child.

  * when the 'graceful' signal hits, httpd children wait for their
current connection to close, then exit.  Meanwhile, the httpd
parent spawns a new "generation" of httpd children.  Obviously,
the httpd child servicing the svn commit sticks around a very long
time, because svn doesn't hang up until it's done sending
everything.

  * for some reason, the 'rotatelogs' process dies.  It's not clear
whether it's responding to a signal, or if the httpd parent is
killing it, or what.  A new 'rotatelogs' takes its place, with new
httpd children connecting to it.  Meanwhile, the "old" httpd child
continues to service svn, and continues to write logdata to a dead
pipe... there's nobody reading data from the pipe on the other end
anymore!

  * Eventually the pipe fills up, and the httpd child just hangs
trying to write to it.


I think this theory is true, for a few reasons:

  * every time I run 'strace -p PID' on the frozen httpd, it claims to
writing logdata.   gdb confirms this as well.

  * edk is able to reproduce the problem without subversion, simply by
hand-typing HTTP requests chained together by a Keep-Alive header.

  * 'svn import' claims to have received 'success' repsonses on about
20 more files beyond what accesslog shows, implying a pipe-backup.

  * The clincher: in all my testing on different platforms (7 or 8
different setups) this bug is reproducible *every* time httpd.conf
is using 'rotatelogs', and the bug vanishes when I stop using
'rotatelogs'.


Final analysis: 

This looks like some kind of bug in Apache itself, not
related to SSL or Subversion at all... it looks like a bug in the
interaction between 'rotatelogs' process and clients that use
Keep-Alive.

Any comments or thoughts?