3.2.8 - Memory leaks with util.FieldStorage
Hello, I'm using MOD_APACHE 3.2.8 (from binary dist). with Apache 2.0.55 under Windows XP Pro. I encountermemory leaks (~ 16 Ko per request) with a very basic handler like : import mod_python from mod_python import util def handler(req):F=util.FieldStorage( req )return mod_python.apache.OK And sending an HTTP request like : Output from TCPWATCH POST http://localhost:80/python/Alertes.py HTTP/1.0Content-Type: multipart/form-data; boundary=061006144341906Content-Length: 209Proxy-Connection: keep-aliveHost: www.tx2-localhostAccept: text/html, */*User-Agent: Mozilla/3.0 (compatible; Indy Library)Proxy-Authorization: Basic Og== --061006144341906Content-Disposition: form-data; name="TYPE" LAST_ALERTS--061006144341906Content-Disposition: form-data; name="FILEAGE" 180 --061006144341906-- Has somebody encountered the same problem ? Is there a turn-around or afix that can be forecasted ? Best regards, Laurent.
Re: 3.2.8 - Memory leaks with util.FieldStorage
Jim Gallacher wrote: Laurent Blanquet wrote: Hello, I'm using MOD_APACHE 3.2.8 (from binary dist). with Apache 2.0.55 under Windows XP Pro. I encounter memory leaks (~ 16 Ko per request) with a very basic handler like : import mod_python from mod_python import util def handler(req): F=util.FieldStorage( req ) return mod_python.apache.OK And sending an HTTP request like : Output from TCPWATCH POST http://localhost:80/python/Alertes.py HTTP/1.0 Content-Type: multipart/form-data; boundary=061006144341906 Content-Length: 209 Proxy-Connection: keep-alive Host: www.tx2-localhost Accept: text/html, */* User-Agent: Mozilla/3.0 (compatible; Indy Library) Proxy-Authorization: Basic Og== --061006144341906 Content-Disposition: form-data; name=TYPE LAST_ALERTS --061006144341906 Content-Disposition: form-data; name=FILEAGE 180 --061006144341906-- Has somebody encountered the same problem ? Is there a turn-around or a fix that can be forecasted ? This is the first that this has been reported, so forecasting a fix is difficult. It would be helpful if you created a JIRA issue at http://issues.apache.org/jira/browse/MODPYTHON util.FieldStorage is straight python code, so I'm surprised that it would be leaking, particularly 16K per request. Can you offer any more details? How are you testing, other mod_python Apache directives you may be using an so on. Laurent, Also, could you confirm that it *is* FieldStorage that is leaking (if you haven't already) by just reading the request body, bypassing FieldStorage completely. eg. import mod_python from mod_python import util def handler(req): data = req.read() return mod_python.apache.OK Thanks, Jim
[jira] Created: (MODPYTHON-172) Memory leak with util.fieldstorage using mod_python 3.2.8 on apache 2.0.55
Memory leak with util.fieldstorage using mod_python 3.2.8 on apache 2.0.55 -- Key: MODPYTHON-172 URL: http://issues.apache.org/jira/browse/MODPYTHON-172 Project: mod_python Type: Bug Components: core Versions: 3.2.8 Environment: Win32 XP SP1 / SP2 Apache 2.0.55 installed from binary (.MSI) Python 2.4.2 or 2.4.3installed from binary from www.python.org Reporter: Laurent Blanquet I encounter memory leaks [~ 16 K per request) using the configuration described below. = Python configuration from Httpd.conf: = Alias /python/ d:/python24/B2B/ Directory d:/python24/B2B AddHandler mod_python .py PythonHandler pyHandlerHTTP PythonDebug On /Directory = Test handler - pyHandlerHTTP.py : = import mod_python from mod_python import util def handler(req): #Removing this line solves the problem. F=util.FieldStorage( req ) return mod_python.apache.OK = HTTP Request (dump using TCPWATCH): = POST http://localhost:80/python/Alertes.py HTTP/1.0 Content-Type: multipart/form-data; boundary=061006144341906 Content-Length: 209 Proxy-Connection: keep-alive Host: www.tx2-localhost Accept: text/html, */* User-Agent: Mozilla/3.0 (compatible; Indy Library) Proxy-Authorization: Basic Og== --061006144341906 Content-Disposition: form-data; name=TYPE LAST_ALERTS --061006144341906 Content-Disposition: form-data; name=FILEAGE 180 --061006144341906 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: 3.2.8 - Memory leaks with util.FieldStorage
Jim Gallacher wrote: Laurent Blanquet wrote: Hello, I'm using MOD_APACHE 3.2.8 (from binary dist). with Apache 2.0.55 under Windows XP Pro. I encounter memory leaks (~ 16 Ko per request) with a very basic handler like : import mod_python from mod_python import util def handler(req): F=util.FieldStorage( req ) return mod_python.apache.OK Has somebody encountered the same problem ? Is there a turn-around or a fix that can be forecasted ? This is the first that this has been reported, so forecasting a fix is difficult. It would be helpful if you created a JIRA issue at http://issues.apache.org/jira/browse/MODPYTHON util.FieldStorage is straight python code, so I'm surprised that it would be leaking, particularly 16K per request. That was a stupid statement on my part since FieldStorage does call some mod_python code that *is* written in C, and could very well leak. It is however correct to say that we have not had a leak reported against FieldStorage in particular. There were some leaks in 3.1.4, but I thought we caught them for the 3.2.7 release. Back to the code. Jim
Re: Knocking items off the plate, one by one
http://archives.apache.org/dist/httpd is always out there ;-) Not strictly a dev subject, but: Speaking of archives, I noticed there are no pre-1.3 sources there. For a real archive, it'd be nice to have them there. I have placed Apache httpd 1.1.1 and 1.1.3 on http://sanguis.xs4all.nl/apache/ and I'm sure there are people here who have other old (pre-1.3) stuff too. Joost
Re: Knocking items off the plate, one by one
Joost de Heer wrote: http://archives.apache.org/dist/httpd is always out there ;-) Not strictly a dev subject, but: Speaking of archives, I noticed there are no pre-1.3 sources there. For a real archive, it'd be nice to have them there. Okay, I just noticed that there are a few 1.2 sources in the 'beta' subdirectory Joost
mod_rewrite performance proposal
I've noticed a performance issue on a large site that makes heavy use of txt RewriteMaps. I'd like to propose an alternative implementation of txt maps to deal with the issue. This is how the current implementation works: - lookup_map() is responsible for extracting data from maps - For a txt map, it first attempts to look up the key in the in-memory cache, using get_cache_value() - get_cache_value() acquires a mutex M before looking in the cache. - If get_cache_value() returned a value, lookup_map() returns that. - Otherwise, lookup_map() will: - Call lookup_map_txtfile() to find the value - Store the value in the cache (or an empty string, if no value was found) by calling set_cache_value() - set_cache_value() acquires M. It then stores the value in the cache, clearing the cache first if there's an existing cache which is outdated. - lookup_map_txtfile() opens the map file, and does a linear search on its contents to find the desired value. The behaviour of this implementation can cause problems at server startup. This is particularly apparent on a busy server that makes substantial use of txt RewriteMaps. At server startup, no map data has been cached. Suppose we're using the worker MPM, and that thread T1 in process P1 receives a request which requires use of a map. This causes lookup_map_txtfile() to read the value out of the file, taking time linear in the length of the file. (It's possible for the server administrator to optimise this by ordering the map so that more probable items are earlier in the file; but that's rather beside the point, and isn't documented anyway.) Meanwhile, thread T2 in process P1 also needs a value from the map. T1 and T2 serialize their attempts to use the cache -- even just to determine whether the value is present and up-to-date. This is already potentially problematic. The more threads you have per process, the more likely it is that threads will contend for the mutex protecting a given map's cache. Worse, T1 and T2 will each open the file, and scan through it for the values they're interested in. For a sufficiently long-lived server process, we will ultimately reach a steady state in which all the data from all maps is present in the memory cache. But reaching that steady state involves one linear-time scan through each file for each line it contains. That is, the time needed to cache the data in a single map file is quadratic in the number of lines in that file. It's also possible for multiple threads to simultaneously read the same value from a given file. While the mutex used by get_cache_value() and set_cache_value() ensures that the cache isn't damaged by concurrent accesses, it can happen that two threads will set a given key to the same cached value immediately after each other: T1 checks for the existence of key K, and finds it absent T2 checks for the existence of K, and finds it absent T1 reads K's value V from the file T2 reads V from the file T1 stores V in the cache T2 stores the same V in the cache That doesn't affect the asymptotic complexity, but it's obviously wasted effort. Beyond a single process, note that during server startup, all processes are doing this simultaneously. This is an enormous amount of I/O to be doing merely to build an in-memory hash table of the contents of some text files. If you have enough maps in use and a map-lookup workload which is sufficiently high, server startup can be enormously costly. We have observed machines demonstrating the symptoms of thrash death during graceful restart, apparently because of this effect. I propose the following alternative implementation: - On server startup, read all txt (and rnd) maps into memory in their entirety, recording the mtime of the map file. This should be done in post_config, so that the cached data is available to all child processes. - On map lookup: - Determine the mtime of the map file - If the cache is up-to-date: - Acquire a thread rwlock L for this file's cache, in shared mode - Read the value from the cache - Release L - If the cache is outdated, attempt to acquire a thread mutex M for this file, without blocking - If M was acquired, it's this thread's responsibility to refresh the cache: - Read the entire file into a new hash - Acquire L in exclusive mode - Replace the existing cache with the hash just read - Release L - Release M - If M wasn't acquired, some other thread must be busy refreshing the cache for this file, so read from the existing cache as if it were up-to-date, using L as normal I have convinced myself that this scheme is thread-safe, and that where possible it avoids serializing threads' access to the cache. It should scale well to servers with long-lived processes that use many large maps:
Re: mod_rewrite performance proposal
Aaron Crane wrote: I've noticed a performance issue on a large site that makes heavy use of txt RewriteMaps. I'd like to propose an alternative implementation of txt maps to deal with the issue. I think most of the proposal makes sense, but have you considered using DBM Files for the RewriteMaps? -Paul
Re: mod_rewrite performance proposal
On Saturday 10 June 2006 20:53, Aaron Crane wrote: I've noticed a performance issue on a large site that makes heavy use of txt RewriteMaps. El Reg, by any chance[1]? I'd like to propose an alternative implementation of txt maps to deal with the issue. What you propose makes sense if we accept current behaviour is a problem. Indeed, it's arguably OTT to look up anything per-request if you're cacheing in-memory. But it begs the question: if lookup is a performance issue, wouldn't a switch to DBM Rewritemaps be the obvious fix? I'd welcome any feedback anyone could offer on this proposal. I'm happy to do the work myself; is there a reasonable chance something along these lines would be accepted if I do? If not, what changes would be needed to make this suitable for httpd? Speaking from a position of ignorance on the workings of RewriteMap, I think I like it. And if you're offering to do the work, it sounds like a good offer to me. So that's a cautious +1. [1] a guess based on your talk at last year's ApacheCon. -- Nick Kew
Re: mod_rewrite performance proposal
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Aaron Crane wrote: I've noticed a performance issue on a large site that makes heavy use of txt RewriteMaps. I'd like to propose an alternative implementation of txt maps to deal with the issue. It sounds like everybody wins. Yes, folks should switch to dbm, but many won't. And if you are going to implement this, then I say go for it. +1. - -- Rich Bowen [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEiy//XP03+sx4yJMRArp1AKDWbQwtTz7pbgc+jSyVmOtftJN0lACg1S/l T+1X6dRxSMCF/IfueYtWg6w= =mXM+ -END PGP SIGNATURE-