Bug report for Apache httpd-1.3 [2009/01/04]
+---+ | Bugzilla Bug ID | | +-+ | | Status: UNC=Unconfirmed NEW=New ASS=Assigned| | | OPN=ReopenedVER=Verified(Skipped Closed/Resolved) | | | +-+ | | | Severity: BLK=Blocker CRI=Critical REG=Regression MAJ=Major | | | | MIN=Minor NOR=NormalENH=Enhancement TRV=Trivial | | | | +-+ | | | | Date Posted | | | | | +--+ | | | | | Description | | | | | | | |10744|New|Nor|2002-07-12|suexec might fail to open log file| |10747|New|Maj|2002-07-12|ftp SIZE command and 'smart' ftp servers results i| |10760|New|Maj|2002-07-12|empty ftp directory listings from cached ftp direc| |14518|Opn|Reg|2002-11-13|QUERY_STRING parts not incorporated by mod_rewrite| |16013|Opn|Nor|2003-01-13|Fooling mod_autoindex + IndexIgnore | |16631|Inf|Min|2003-01-31|.htaccess errors logged outside the virtual host l| |17318|Inf|Cri|2003-02-23|Abend on deleting a temporary cache file if proxy | |19279|Inf|Min|2003-04-24|Invalid chmod options in solaris build| |21637|Inf|Nor|2003-07-16|Timeout causes a status code of 200 to be logged | |21777|Inf|Min|2003-07-21|mod_mime_magic doesn't handle little gif files| |22618|New|Maj|2003-08-21|MultiViews invalidates PATH_TRANSLATED if cgi-wrap| |25057|Inf|Maj|2003-11-27|Empty PUT access control in .htaccess overrides co| |26126|New|Nor|2004-01-14|mod_include hangs with request body | |26152|Ass|Nor|2004-01-15|Apache 1.3.29 and below directory traversal vulner| |26790|New|Maj|2004-02-09|error deleting old cache file | |29257|Opn|Nor|2004-05-27|Problem with apache-1.3.31 and mod_frontpage (dso,| |29498|New|Maj|2004-06-10|non-anonymous ftp broken in mod_proxy | |29538|Ass|Enh|2004-06-12|No facility used in ErrorLog to syslog| |30207|New|Nor|2004-07-20|Piped logs don't close read end of pipe | |30877|New|Nor|2004-08-26|htpasswd clears passwd file on Sun when /var/tmp i| |30909|New|Cri|2004-08-28|sporadic segfault resulting in broken connections | |31975|New|Nor|2004-10-29|httpd-1.3.33: buffer overflow in htpasswd if calle| |32078|New|Enh|2004-11-05|clean up some compiler warnings | |32539|New|Trv|2004-12-06|[PATCH] configure --enable-shared= brocken on SuSE| |32974|Inf|Maj|2005-01-06|Client IP not set | |33086|New|Nor|2005-01-13|unconsistency betwen 404 displayed path and server| |33495|Inf|Cri|2005-02-10|Apache crashes with "WSADuplicateSocket failed for| |33772|New|Nor|2005-02-28|inconsistency in manual and error reporting by sue| |33875|New|Enh|2005-03-07|Apache processes consuming CPU| |34108|New|Nor|2005-03-21|mod_negotiation changes mtime to mtime of Document| |34114|New|Nor|2005-03-21|Apache could interleave log entries when writing t| |34404|Inf|Blk|2005-04-11|RewriteMap prg can not handle fpout | |34571|Inf|Maj|2005-04-22|Apache 1.3.33 stops logging vhost| |34573|Inf|Maj|2005-04-22|.htaccess not working / mod_auth_mysql| |35424|New|Nor|2005-06-20|httpd disconnect in Timeout on CGI| |35439|New|Nor|2005-06-21|Problem with remove "/../" in util.c and mod_rewri| |35547|Inf|Maj|2005-06-29|Problems with libapreq 1.2 and Apache::Cookie | |3|New|Nor|2005-06-30|Can't find DBM on Debian Sarge| |36375|Opn|Nor|2005-08-26|Cannot include http_config.h from C++ file| |37166|New|Nor|2005-10-19|Under certain conditions, mod_cgi delivers an empt| |37185|New|Enh|2005-10-20|AddIcon, AddIconByType for OpenDocument format| |37252|New|Reg|2005-10-26|gen_test_char reject NLS string | |38989|New|Nor|2006-03-15|restart + piped logs stalls httpd for 24 minutes (| |39104|New|Enh|2006-03-25|[FR] fix build with -Wl,--as-needed | |39287|New|Nor|2006-04-12|Incorrect If-Modified-Since validation (due to syn| |39937|New|Nor|2006-06-30|Garbage output if README.html is gzipped or compre| |40176|New|Nor|2006-08-03|magic and mime| |40224|Ver|Nor|2006-08-10|System time crashes Apache @year 2038 (win32 only?| |41279|New|Nor|2007-01-02|Apache 1.3.37 htpasswd is vulnerable to buffer ove| |42355|New|Maj|2007-05-08|Apache 1.3 permits non-rfc HTTP error code >= 600 | |43626|New|Maj|2007-10-15|r->path_info returning invalid value | |44768|
Issues with mod_disk_cache and htcacheclean
I posted this on the users list, but was advised to post it to dev as well, since it seemed relevant to developers. Hope that's ok... I am using Apache 2.2.9 on Linux AMD64, built from source. There is one server running two builds of Apache - a lightweight front-end caching reverse proxy configuration using mod_disk_cache, and a heavyweight mod_perl back end. I use caching to relieve load on the server when many people request the same page at once. The website is dynamic and contains millions of page permutations. Thus the cache has a tendency to get fairly large, unless it is pruned. So I have been trying to use htcacheclean to achieve this. There have been some issues, which I will outline below. First, I found that htcacheclean was not able to keep up with pruning the cache. It just kept growing. I initially ran htcacheclean in daemon mode, thus: htcacheclean -i -t -n -d60 -p/var/cache/www -l1000M CacheDirLevels was 3 and CacheDirLength 1. The cache would just keep getting bigger, to multiple GB. Additionally, even doing a du on the cache could take hours to complete. I also noticed that iowait would spike when I tried running htcacheclean in non-daemon mode. It would not keep up at all using the -n ("nice") option; when I took that off, the iowait would go through the roof and the process would take hours to complete. This was on a quad core AMD64 server with 4 x 10k SCSI drives in hardware RAID0. Upon investigation, I discovered that the cache was a lot deeper than I expected. In addition to the three levels specified in CacheDirLevels, there were then additional levels of subdirectories beneath ".vary" subdirs. For each .header file, there was a .vary subdir with three levels of directory below that. Simply traversing this tree with du could take a long time - hours sometimes, depending on how long the server had been running without a cache clear. I discovered that the .vary subdirs were caused by my configuration, which was introducing a Vary http header. This came from two sources: First, mod_deflate. I found this out from this helpful page: http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html So I disabled mod_deflate, since it seemed to be producing a huge number of cache entries for each file - a different one for every browser. But after disabling mod_deflate, the .vary subdirs were still there. I also had this line in my config: Header add Vary "Cookie" This is necessary because users on my site set options for how the site is displayed. When I tried disabling this cookie Vary header, the number of directories went down substantially, to the expected three levels. The cache structure was much simpler, and it seemed that htcacheclean could keep up with this. However, the site was broken - since the same page for different users with different options would be cached only once. So someone who had "no ads" or "no pics" would request a page that someone else had recently requested (with different options), and they would get that other person's options. Not good. So I had to switch the vary header for cookies back on, so that pages would get differentiated in the cache based on cookie. But now I was back to square one - six effective levels of subdirectory, which htcacheclean could not keep up with. After some thought, I ended up changing CacheDirLevels to 2, to try to reduce the depth of the tree. Now I had fewer subdirs, but more files in each one. Also, the size of the cache, via du, always seems to be much higher than specified for htcacheclean. I lowered the limit to 100M, but still the cache is regularly up at 180MB or 200MB. This seems counter-intuitive, since htcacheclean doesn't appear to be taking the true size of the cache into account (i.e. including all the subdirs, which also take up space and presumably are what cause the discrepancy). I also noticed something else: htcacheclean was leaving behind .header files. When it cleaned the .vary subdirs, it seemed to leave behind the corresponding .header files. These would accumulate, causing the iowait to gradually increase, presumably due to the size of the directories. I would rotate (clear) the cache manually at midnight. The behavior I would see (via munin monitoring tool) was that iowait would then remain at zero for about 12 hours, but then would gradually become visible as the number of .header files would accumulate. So I wrote a perl script which could go through the cache, and look for .header files, and for each one found, see if a corresponding .vary subdir exists for it. If not, then the .header file is deleted. I then run another script to prune empty subdirectories. Currently I run this combination every 10 minutes - first a non-daemon invocation of htcacheclean, followed by the header prune script, followed by the empty subdirs pruning script. This seems to keep the cache small, and iowait is not noticeable any more, since the "junk" .header files are now dispo
Re: Problem with file descriptor handling in httpd 2.3.1
On 01/04/2009 06:28 PM, Rainer Jung wrote: > On 04.01.2009 17:57, Rainer Jung wrote: >> When the content file gets opened, its cleanup is correctly registered >> with the request pool. Later in core_filters.c at the end of function >> ap_core_output_filter() line 528 we call setaside_remaining_output(). > > ... > >> 2.2.x has a different structure, although I can also see two calls to >> ap_save_brigade() in ap_core_output_filter(), but they use different >> pools as new targets, namely a deferred_write_pool resp. input_pool. > > And the code already contains the appropriate hint: > > static void setaside_remaining_output(...) > { > ... > if (make_a_copy) { > /* XXX should this use a separate deferred write pool, like > * the original ap_core_output_filter? > */ > ap_save_brigade(f, &(ctx->buffered_bb), &bb, c->pool); > ... > } > Thanks for the analysis and good catch. Maybe I have a look into this by tomorrow. Regards Rüdiger
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 17:57, Rainer Jung wrote: When the content file gets opened, its cleanup is correctly registered with the request pool. Later in core_filters.c at the end of function ap_core_output_filter() line 528 we call setaside_remaining_output(). ... 2.2.x has a different structure, although I can also see two calls to ap_save_brigade() in ap_core_output_filter(), but they use different pools as new targets, namely a deferred_write_pool resp. input_pool. And the code already contains the appropriate hint: static void setaside_remaining_output(...) { ... if (make_a_copy) { /* XXX should this use a separate deferred write pool, like * the original ap_core_output_filter? */ ap_save_brigade(f, &(ctx->buffered_bb), &bb, c->pool); ... }
Re: Why is r->handler a garbled string?
On Jan 2, 2009, at 12:14 PM, John David Duncan wrote: Yes, in fact the package does contain config.nice, which begins: CC="/opt/SUNWspro/bin/cc"; export CC CFLAGS="-xO3 -xarch=386 -xchip=pentium -xspace -Xa -xildoff - xc99=all -DSSL_EXPERIMENTAL -DSSL_ENGINE -xO4"; export CFLAGS That's what was passed in by the builders: the configure script will make some stuff up itself. Those flags alone don't do the trick for me, but the ones from apr- config make it all work. Good to hear. Yes, we've been struggling with documentation and/or a reusable build environment for out-of-tree modules. I think there was some initiative in that direction, but don't know what came of it. S. -- Sander Temme scte...@apache.org PGP FP: 51B4 8727 466A 0BC3 69F4 B7B8 B2BE BC40 1529 24AF smime.p7s Description: S/MIME cryptographic signature
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 15:04, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. MaxKeepAliveRequests=100 (Default) - the file leading to EMFILE is the static content file, which can be observed to be open more than 1000 times in parallel although ab concurrency is only 20 - From looking at the code it seems the file is closed during a cleanup function associated to the request pool, which is triggered by an EOR bucket Now what happens under KeepAlive is that the content files are kept open longer than the handling of the request, more precisely until the closing of the connection. So when MaxKeepAliveRequests*Concurrency> MaxNumberOfFDs we run out of file descriptors. I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with Event, Worker and Prefork. I didn't yet have the time to retest with 2.2. It should only happen in 2.3.x/trunk because the EOR bucket is a new feature to let MPMs do async writes once the handler has finished running. And yes, this sounds like a nasty bug. I verified I can't reproduce with the same platform and 2.2.11. Not sure I understand the EOR asynchronicity good enough to analyze the root cause. Can you try the following patch please? Here's the gdb story: When the content file gets opened, its cleanup is correctly registered with the request pool. Later in core_filters.c at the end of function ap_core_output_filter() line 528 we call setaside_remaining_output(). This goes down the stack via ap_save_brigade(), file_bucket_setaside() to apr_file_setaside(). This kills the cleanup for the request pool and adds it instead to the transaction (=connection) pool. There we are. 2.2.x has a different structure, although I can also see two calls to ap_save_brigade() in ap_core_output_filter(), but they use different pools as new targets, namely a deferred_write_pool resp. input_pool. So now we know, how it happens, but I don't have an immediate idea how to solve it. Regards, Rainer
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 16:22, Rainer Jung wrote: On 04.01.2009 15:56, Ruediger Pluem wrote: On 01/04/2009 03:48 PM, Rainer Jung wrote: On 04.01.2009 15:40, Ruediger Pluem wrote: On 01/04/2009 03:26 PM, Rainer Jung wrote: On 04.01.2009 14:14, Ruediger Pluem wrote: On 01/04/2009 11:24 AM, Rainer Jung wrote: On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. What is the exact size of the file? It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? I tried on the same type of system with event MPM and 2.2.11. Can't reproduce even with content file of size 257 bytes. Possibly you need to increase the number of threads per process with event MPM and the number of concurrent requests from ab. I increased the maximum KeepAlive Requests and the KeepAlive timeout a lot and during a longer running test I see always exactly as many open FDs for the content file in /proc/PID/fd as I had concurrency in ab. So it seems the FDs always get closed before handling the next request in the connection. After testing the patch, I'll try it again with 257 bytes on 2.2.11 with prefork or worker. IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth testing. It still confuses me that this happens on trunk as it looks like that ab does not do pipelining. ^The strace log shows, that the sequence really is - new connection - read request - open file - send response - log request repeat this triplet a lot of times (maybe as long as KeepAlive is active) and then there are a lot of close() for the content files. Not sure, about the exact thing that triggers the close. So I don't necessarily see pipelining (in the sense of sending more requests before responses return) being necessary. I tested your patch (worker, trunk): It does not help. I then added an error log statement directly after the requests++ and it shows this number is always "1". I can now even reproduce without load. Simply open a connection and send hand crafted KeepAlive requests via telnet. The file descriptors are kept open as long as the connection is alive. I'll run under the debugger to see, how the stack looks like, when the file gets closed. Since the logging is done much earlier (directly after eahc request) the problem does not seem to be directly related to EOR. It looks like somehow the close file cleanup does not run when the request pool is destroyed or maybe it is registered with the connection pool. gdb should help. More later. Rainer
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 15:56, Ruediger Pluem wrote: On 01/04/2009 03:48 PM, Rainer Jung wrote: On 04.01.2009 15:40, Ruediger Pluem wrote: On 01/04/2009 03:26 PM, Rainer Jung wrote: On 04.01.2009 14:14, Ruediger Pluem wrote: On 01/04/2009 11:24 AM, Rainer Jung wrote: On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. What is the exact size of the file? It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? I tried on the same type of system with event MPM and 2.2.11. Can't reproduce even with content file of size 257 bytes. Possibly you need to increase the number of threads per process with event MPM and the number of concurrent requests from ab. I increased the maximum KeepAlive Requests and the KeepAlive timeout a lot and during a longer running test I see always exactly as many open FDs for the content file in /proc/PID/fd as I had concurrency in ab. So it seems the FDs always get closed before handling the next request in the connection. After testing the patch, I'll try it again with 257 bytes on 2.2.11 with prefork or worker. IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth testing. It still confuses me that this happens on trunk as it looks like that ab does not do pipelining. ^The strace log shows, that the sequence really is - new connection - read request - open file - send response - log request repeat this triplet a lot of times (maybe as long as KeepAlive is active) and then there are a lot of close() for the content files. Not sure, about the exact thing that triggers the close. So I don't necessarily see pipelining (in the sense of sending more requests before responses return) being necessary. I tested your patch (worker, trunk): It does not help. I then added an error log statement directly after the requests++ and it shows this number is always "1". Regards, Rainer
Re: Problem with file descriptor handling in httpd 2.3.1
On 01/04/2009 03:48 PM, Rainer Jung wrote: > On 04.01.2009 15:40, Ruediger Pluem wrote: >> >> On 01/04/2009 03:26 PM, Rainer Jung wrote: >>> On 04.01.2009 14:14, Ruediger Pluem wrote: On 01/04/2009 11:24 AM, Rainer Jung wrote: > On 04.01.2009 01:51, Ruediger Pluem wrote: >> On 01/04/2009 12:49 AM, Rainer Jung wrote: >>> On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: > During testing 2.3.1 I noticed a lot of errors of type EMFILE: > "Too > many open files". I used strace and the problem looks like this: > > - The test case is using ab with HTTP keep alive, concurrency 20 > and a > small file, so doing about 2000 requests per second. >> What is the exact size of the file? > It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? >>> I tried on the same type of system with event MPM and 2.2.11. Can't >>> reproduce even with content file of size 257 bytes. >> >> Possibly you need to increase the number of threads per process with >> event MPM >> and the number of concurrent requests from ab. > > I increased the maximum KeepAlive Requests and the KeepAlive timeout a > lot and during a longer running test I see always exactly as many open > FDs for the content file in /proc/PID/fd as I had concurrency in ab. So > it seems the FDs always get closed before handling the next request in > the connection. > > After testing the patch, I'll try it again with 257 bytes on 2.2.11 with > prefork or worker. IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth testing. It still confuses me that this happens on trunk as it looks like that ab does not do pipelining. Regards Rüdiger
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 15:40, Ruediger Pluem wrote: On 01/04/2009 03:26 PM, Rainer Jung wrote: On 04.01.2009 14:14, Ruediger Pluem wrote: On 01/04/2009 11:24 AM, Rainer Jung wrote: On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. What is the exact size of the file? It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? I tried on the same type of system with event MPM and 2.2.11. Can't reproduce even with content file of size 257 bytes. Possibly you need to increase the number of threads per process with event MPM and the number of concurrent requests from ab. I increased the maximum KeepAlive Requests and the KeepAlive timeout a lot and during a longer running test I see always exactly as many open FDs for the content file in /proc/PID/fd as I had concurrency in ab. So it seems the FDs always get closed before handling the next request in the connection. After testing the patch, I'll try it again with 257 bytes on 2.2.11 with prefork or worker. Regards, Rainer
Re: Problem with file descriptor handling in httpd 2.3.1
On 01/04/2009 03:26 PM, Rainer Jung wrote: > On 04.01.2009 14:14, Ruediger Pluem wrote: >> >> On 01/04/2009 11:24 AM, Rainer Jung wrote: >>> On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: > On 04.01.2009 00:36, Paul Querna wrote: >> Rainer Jung wrote: >>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too >>> many open files". I used strace and the problem looks like this: >>> >>> - The test case is using ab with HTTP keep alive, concurrency 20 >>> and a >>> small file, so doing about 2000 requests per second. What is the exact size of the file? >>> It is the index.html, via URL /, so size is 45 Bytes. >> >> Can you try if you run in the same problem on 2.2.x with a file of >> size 257 bytes? > > I tried on the same type of system with event MPM and 2.2.11. Can't > reproduce even with content file of size 257 bytes. Possibly you need to increase the number of threads per process with event MPM and the number of concurrent requests from ab. Regards Rüdiger
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 14:14, Ruediger Pluem wrote: On 01/04/2009 11:24 AM, Rainer Jung wrote: On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. What is the exact size of the file? It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? I tried on the same type of system with event MPM and 2.2.11. Can't reproduce even with content file of size 257 bytes. The same file with trunk immediately reproduces the problem. Will try your patch/hack next. Thanks Rainer
Re: Problem with file descriptor handling in httpd 2.3.1
On 01/04/2009 12:49 AM, Rainer Jung wrote: > On 04.01.2009 00:36, Paul Querna wrote: >> Rainer Jung wrote: >>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too >>> many open files". I used strace and the problem looks like this: >>> >>> - The test case is using ab with HTTP keep alive, concurrency 20 and a >>> small file, so doing about 2000 requests per second. >>> MaxKeepAliveRequests=100 (Default) >>> >>> - the file leading to EMFILE is the static content file, which can be >>> observed to be open more than 1000 times in parallel although ab >>> concurrency is only 20 >>> >>> - From looking at the code it seems the file is closed during a >>> cleanup function associated to the request pool, which is triggered by >>> an EOR bucket >>> >>> Now what happens under KeepAlive is that the content files are kept >>> open longer than the handling of the request, more precisely until the >>> closing of the connection. So when MaxKeepAliveRequests*Concurrency > >>> MaxNumberOfFDs we run out of file descriptors. >>> >>> I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with >>> Event, Worker and Prefork. I didn't yet have the time to retest with >>> 2.2. >> >> It should only happen in 2.3.x/trunk because the EOR bucket is a new >> feature to let MPMs do async writes once the handler has finished >> running. >> >> And yes, this sounds like a nasty bug. > > I verified I can't reproduce with the same platform and 2.2.11. > > Not sure I understand the EOR asynchronicity good enough to analyze the > root cause. Can you try the following patch please? Index: server/core_filters.c === --- server/core_filters.c (Revision 731238) +++ server/core_filters.c (Arbeitskopie) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define MAX_REQUESTS_QUEUED 10 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -381,6 +382,7 @@ apr_bucket_brigade *bb; apr_bucket *bucket, *next; apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +int requests; /* Fail quickly if the connection has already been aborted. */ if (c->aborted) { @@ -466,6 +468,7 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +requests = 0; for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -501,11 +504,22 @@ non_file_bytes_in_brigade += bucket->length; } } +else if (bucket->type == &ap_bucket_type_eor) { +/* + * Count the number of requests still queued in the brigade. + * Pipelining of a high number of small files can cause + * a high number of open file descriptors, which if it happens + * on many threads in parallel can cause us to hit the OS limits. + */ +requests++; +} } -if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) { +if ((non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) +|| (requests > MAX_REQUESTS_QUEUED)) { /* ### Writing the entire brigade may be excessive; we really just - * ### need to send enough data to be under THRESHOLD_MAX_BUFFER. + * ### need to send enough data to be under THRESHOLD_MAX_BUFFER or + * ### under MAX_REQUESTS_QUEUED */ apr_status_t rv = send_brigade_blocking(net->client_socket, bb, &(ctx->bytes_written), c); This is still some sort of a hack, but maybe helpful to understand if this is the problem. Regards Rüdiger Index: server/core_filters.c === --- server/core_filters.c (Revision 731238) +++ server/core_filters.c (Arbeitskopie) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define MAX_REQUESTS_QUEUED 10 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -381,6 +382,7 @@ apr_bucket_brigade *bb; apr_bucket *bucket, *next; apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +int requests; /* Fail quickly if the connection has already been aborted. */ if (c->aborted) { @@ -466,6 +468,7 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +requests = 0; for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -501,11 +504,22 @@ non_file_bytes_in_brigade += bucket->length; } } +else if (bucket->type == &ap_bucket_type_eor) { +/* + * Count the number of requests still queued in the brigade. + * Pipelining of a high number of small files can cause + * a
Re: Problem with file descriptor handling in httpd 2.3.1
On 01/04/2009 11:24 AM, Rainer Jung wrote: > On 04.01.2009 01:51, Ruediger Pluem wrote: >> >> On 01/04/2009 12:49 AM, Rainer Jung wrote: >>> On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: > During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too > many open files". I used strace and the problem looks like this: > > - The test case is using ab with HTTP keep alive, concurrency 20 and a > small file, so doing about 2000 requests per second. >> >> What is the exact size of the file? > > It is the index.html, via URL /, so size is 45 Bytes. Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes? Regards Rüdiger
Re: Where to initialize a global pool/hash - server create or child_init?
On Sun, Jan 4, 2009 at 05:03, Jacques Amar wrote: > Where to begin > > I am creating a global pool/hash in which I save cached, hard to calculate > data (pre-compiled regex expressions etc.). I also store pages I've created > using this data in memcached (using the APR interface). I can't save the > calculated regex data in memcached. I initially followed the advice of > creating a private pool, hash and mutex inside a child_init hook in my > server config structure, protecting all access with the mutex. The module > works well enough in regular httpd. However, when I tried this in worker MPM > I got constant: > [notice] child pid x exit signal Segmentation fault (11) > after a few page loads. > > On a whim, I moved the whole creation of these structures into a server > config hook. All these issues seem to have vanished. I have "theorized" that > the pool I am creating at every child creation does not properly work. > > I am looking for a discussion of the pros and cons of creating this in a per > child hook, versus the one time server create hook and any other pointers to > help decide (and debug) where/when I should use one or the other. pre/post_config are run as the user who starts the server (root typically). child_init is run as the httpd user (configured with the User config directive). If the hash is read-only (it is never changed after initialisation) I guess you don't even need the mutex. In this case, I guess there's no difference between child_init and post_config (except the effective UID that runs the code, as said above). If the data is changed by one worker and you want the change be visible to all other workers, you have to put your initialisation in post_config. Regarding debugging, run your apache in debug (single-process) mode (apache2 -X) and check if you still get the segfaults. If yes, load apache2 in a debugger (I use gdb): gdb $HOME/usr/sbin/apache2 set args -f $HOME/etc/apache2/apache2.conf -X break my_handler run Issue a request and check where it segfaults. Sorin -- A: Because it reverses the logical flow of conversation. Q: Why is top-posting frowned upon? A: Top-posting. Q: What is the most annoying thing in e-mail?
Re: Problem with file descriptor handling in httpd 2.3.1
On 04.01.2009 01:51, Ruediger Pluem wrote: On 01/04/2009 12:49 AM, Rainer Jung wrote: On 04.01.2009 00:36, Paul Querna wrote: Rainer Jung wrote: During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many open files". I used strace and the problem looks like this: - The test case is using ab with HTTP keep alive, concurrency 20 and a small file, so doing about 2000 requests per second. What is the exact size of the file? It is the index.html, via URL /, so size is 45 Bytes. Configuration is very close to original, except for: 40c40 < Listen myhost:8000 --- > Listen 80 455,456c455,456 < EnableMMAP off < EnableSendfile off --- > #EnableMMAP off > #EnableSendfile off (because installation is on NFS, but the problem also occurs with those switches on) The following Modules are loaded: LoadModule authn_file_module modules/mod_authn_file.so LoadModule authn_anon_module modules/mod_authn_anon.so LoadModule authn_core_module modules/mod_authn_core.so LoadModule authz_host_module modules/mod_authz_host.so LoadModule authz_groupfile_module modules/mod_authz_groupfile.so LoadModule authz_user_module modules/mod_authz_user.so LoadModule authz_owner_module modules/mod_authz_owner.so LoadModule authz_core_module modules/mod_authz_core.so LoadModule access_compat_module modules/mod_access_compat.so LoadModule auth_basic_module modules/mod_auth_basic.so LoadModule auth_digest_module modules/mod_auth_digest.so LoadModule log_config_module modules/mod_log_config.so LoadModule env_module modules/mod_env.so LoadModule mime_magic_module modules/mod_mime_magic.so LoadModule cern_meta_module modules/mod_cern_meta.so LoadModule expires_module modules/mod_expires.so LoadModule headers_module modules/mod_headers.so LoadModule ident_module modules/mod_ident.so LoadModule usertrack_module modules/mod_usertrack.so LoadModule unique_id_module modules/mod_unique_id.so LoadModule setenvif_module modules/mod_setenvif.so LoadModule version_module modules/mod_version.so LoadModule mime_module modules/mod_mime.so LoadModule unixd_module modules/mod_unixd.so LoadModule status_module modules/mod_status.so LoadModule autoindex_module modules/mod_autoindex.so LoadModule asis_module modules/mod_asis.so LoadModule info_module modules/mod_info.so LoadModule suexec_module modules/mod_suexec.so LoadModule vhost_alias_module modules/mod_vhost_alias.so LoadModule negotiation_module modules/mod_negotiation.so LoadModule dir_module modules/mod_dir.so LoadModule imagemap_module modules/mod_imagemap.so LoadModule actions_module modules/mod_actions.so LoadModule speling_module modules/mod_speling.so LoadModule userdir_module modules/mod_userdir.so LoadModule alias_module modules/mod_alias.so LoadModule rewrite_module modules/mod_rewrite.so To reproduce you must use KeepAlive and your MaxKeepAliveRequests (Default:100) times concurrency must exceed the maximum number of FDs. Even without exceeding, you can use "httpd -X" and look at /proc/PID/fd during the test run. You should be able to notice a huge number of fds, all pointing to the index.html. Regards, Rainer