3.2.8 - Memory leaks with util.FieldStorage

2006-06-10 Thread Laurent Blanquet



Hello,

I'm using MOD_APACHE 3.2.8 (from binary dist). 
with Apache 2.0.55 under Windows XP 
Pro.
I encountermemory leaks (~ 16 Ko per request) 
with a very basic handler like :

import mod_python
from mod_python import 
util
def 
handler(req):F=util.FieldStorage( req 
)return 
mod_python.apache.OK
And sending an HTTP request like : 

 Output from TCPWATCH 
POST http://localhost:80/python/Alertes.py HTTP/1.0Content-Type: multipart/form-data; 
boundary=061006144341906Content-Length: 209Proxy-Connection: 
keep-aliveHost: www.tx2-localhostAccept: text/html, */*User-Agent: Mozilla/3.0 
(compatible; Indy Library)Proxy-Authorization: Basic 
Og==

--061006144341906Content-Disposition: form-data; 
name="TYPE"

LAST_ALERTS--061006144341906Content-Disposition: 
form-data; name="FILEAGE"

180

--061006144341906--

Has somebody encountered the same problem 
?
Is there a turn-around or afix that can 
be forecasted ?

Best regards, 
Laurent.


Re: 3.2.8 - Memory leaks with util.FieldStorage

2006-06-10 Thread Jim Gallacher

Jim Gallacher wrote:

Laurent Blanquet wrote:

Hello,

I'm using MOD_APACHE 3.2.8 (from binary dist). with Apache 2.0.55 
under Windows XP Pro.
I encounter memory leaks (~ 16 Ko per request) with a very basic 
handler like :


import mod_python
from mod_python import util

def handler(req):
F=util.FieldStorage( req )   return mod_python.apache.OK

And sending an HTTP request like :
  Output from TCPWATCH 
POST http://localhost:80/python/Alertes.py HTTP/1.0
Content-Type: multipart/form-data; boundary=061006144341906
Content-Length: 209
Proxy-Connection: keep-alive
Host: www.tx2-localhost
Accept: text/html, */*
User-Agent: Mozilla/3.0 (compatible; Indy Library)
Proxy-Authorization: Basic Og==

--061006144341906
Content-Disposition: form-data; name=TYPE

LAST_ALERTS
--061006144341906
Content-Disposition: form-data; name=FILEAGE

180

--061006144341906--


Has somebody encountered the same problem ?
Is there a turn-around  or a fix that can be forecasted ?


This is the first that this has been reported, so forecasting a fix is 
difficult. It would be helpful if you created a JIRA issue at

http://issues.apache.org/jira/browse/MODPYTHON

util.FieldStorage is straight python code, so I'm surprised that it 
would be leaking, particularly 16K per request. Can you offer any more 
details? How are you testing, other mod_python Apache directives you may 
be using an so on.


Laurent,

Also, could you confirm that it *is* FieldStorage that is leaking (if 
you haven't already) by just reading the request body, bypassing 
FieldStorage completely. eg.


import mod_python
from mod_python import util

def handler(req):
data = req.read()
return mod_python.apache.OK

Thanks,
Jim


[jira] Created: (MODPYTHON-172) Memory leak with util.fieldstorage using mod_python 3.2.8 on apache 2.0.55

2006-06-10 Thread Laurent Blanquet (JIRA)
Memory leak with util.fieldstorage using mod_python 3.2.8 on apache 2.0.55
--

 Key: MODPYTHON-172
 URL: http://issues.apache.org/jira/browse/MODPYTHON-172
 Project: mod_python
Type: Bug

  Components: core  
Versions: 3.2.8
 Environment: Win32 XP  SP1 / SP2
Apache 2.0.55  installed from binary (.MSI)
Python 2.4.2  or  2.4.3installed from binary from www.python.org
Reporter: Laurent Blanquet


I encounter memory leaks [~ 16 K per request) using the configuration described 
below.

=
Python configuration from Httpd.conf:
=
Alias /python/ d:/python24/B2B/  
Directory d:/python24/B2B
AddHandler mod_python .py  
PythonHandler pyHandlerHTTP  
PythonDebug On 
/Directory   
=
Test handler -  pyHandlerHTTP.py :
=
import mod_python
from mod_python import util

def handler(req):
  #Removing this line solves the problem.
  F=util.FieldStorage( req )   
  return mod_python.apache.OK
=
HTTP Request (dump using TCPWATCH):
=
POST http://localhost:80/python/Alertes.py HTTP/1.0
Content-Type: multipart/form-data; boundary=061006144341906
Content-Length: 209
Proxy-Connection: keep-alive
Host: www.tx2-localhost
Accept: text/html, */*
User-Agent: Mozilla/3.0 (compatible; Indy Library)
Proxy-Authorization: Basic Og==
 
--061006144341906
Content-Disposition: form-data; name=TYPE
 
LAST_ALERTS
--061006144341906
Content-Disposition: form-data; name=FILEAGE
 
180
 
--061006144341906



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: 3.2.8 - Memory leaks with util.FieldStorage

2006-06-10 Thread Jim Gallacher

Jim Gallacher wrote:

Laurent Blanquet wrote:

Hello,

I'm using MOD_APACHE 3.2.8 (from binary dist). with Apache 2.0.55 
under Windows XP Pro.
I encounter memory leaks (~ 16 Ko per request) with a very basic 
handler like :


import mod_python
from mod_python import util

def handler(req):
F=util.FieldStorage( req )   return mod_python.apache.OK



Has somebody encountered the same problem ?
Is there a turn-around  or a fix that can be forecasted ?


This is the first that this has been reported, so forecasting a fix is 
difficult. It would be helpful if you created a JIRA issue at

http://issues.apache.org/jira/browse/MODPYTHON

util.FieldStorage is straight python code, so I'm surprised that it 
would be leaking, particularly 16K per request.


That was a stupid statement on my part since FieldStorage does call some 
mod_python code that *is* written in C, and could very well leak.  It is 
however correct to say that we have not had a leak reported against 
FieldStorage in particular. There were some leaks in 3.1.4, but I 
thought we caught them for the 3.2.7 release.


Back to the code.

Jim


Re: Knocking items off the plate, one by one

2006-06-10 Thread Joost de Heer

http://archives.apache.org/dist/httpd is always out there ;-)


Not strictly a dev subject, but:

Speaking of archives, I noticed there are no pre-1.3 sources there. For a real 
archive, it'd be nice to have them there.


I have placed Apache httpd 1.1.1 and 1.1.3 on http://sanguis.xs4all.nl/apache/ 
and I'm sure there are people here who have other old (pre-1.3) stuff too.


Joost


Re: Knocking items off the plate, one by one

2006-06-10 Thread Joost de Heer

Joost de Heer wrote:

http://archives.apache.org/dist/httpd is always out there ;-)


Not strictly a dev subject, but:

Speaking of archives, I noticed there are no pre-1.3 sources there. For 
a real archive, it'd be nice to have them there.


Okay, I just noticed that there are a few 1.2 sources in the 'beta' 
subdirectory


Joost


mod_rewrite performance proposal

2006-06-10 Thread Aaron Crane
I've noticed a performance issue on a large site that makes heavy use of txt
RewriteMaps.  I'd like to propose an alternative implementation of txt maps
to deal with the issue.

This is how the current implementation works:

  - lookup_map() is responsible for extracting data from maps

  - For a txt map, it first attempts to look up the key in the in-memory
cache, using get_cache_value()

  - get_cache_value() acquires a mutex M before looking in the cache.

  - If get_cache_value() returned a value, lookup_map() returns that.

  - Otherwise, lookup_map() will:

  - Call lookup_map_txtfile() to find the value

  - Store the value in the cache (or an empty string, if no value was
found) by calling set_cache_value()

  - set_cache_value() acquires M.  It then stores the value in the cache,
clearing the cache first if there's an existing cache which is outdated.

  - lookup_map_txtfile() opens the map file, and does a linear search on its
contents to find the desired value.

The behaviour of this implementation can cause problems at server startup.
This is particularly apparent on a busy server that makes substantial use of
txt RewriteMaps.

At server startup, no map data has been cached.  Suppose we're using the
worker MPM, and that thread T1 in process P1 receives a request which
requires use of a map.  This causes lookup_map_txtfile() to read the value
out of the file, taking time linear in the length of the file.  (It's
possible for the server administrator to optimise this by ordering the map
so that more probable items are earlier in the file; but that's rather
beside the point, and isn't documented anyway.)

Meanwhile, thread T2 in process P1 also needs a value from the map.  T1 and
T2 serialize their attempts to use the cache -- even just to determine
whether the value is present and up-to-date.  This is already potentially
problematic.  The more threads you have per process, the more likely it is
that threads will contend for the mutex protecting a given map's cache.

Worse, T1 and T2 will each open the file, and scan through it for the values
they're interested in.  For a sufficiently long-lived server process, we
will ultimately reach a steady state in which all the data from all maps is
present in the memory cache.  But reaching that steady state involves one
linear-time scan through each file for each line it contains.  That is, the
time needed to cache the data in a single map file is quadratic in the
number of lines in that file.

It's also possible for multiple threads to simultaneously read the same
value from a given file.  While the mutex used by get_cache_value() and
set_cache_value() ensures that the cache isn't damaged by concurrent
accesses, it can happen that two threads will set a given key to the same
cached value immediately after each other:

  T1 checks for the existence of key K, and finds it absent
  T2 checks for the existence of K, and finds it absent
  T1 reads K's value V from the file
  T2 reads V from the file
  T1 stores V in the cache
  T2 stores the same V in the cache

That doesn't affect the asymptotic complexity, but it's obviously wasted
effort.

Beyond a single process, note that during server startup, all processes are
doing this simultaneously.

This is an enormous amount of I/O to be doing merely to build an in-memory
hash table of the contents of some text files.  If you have enough maps in
use and a map-lookup workload which is sufficiently high, server startup can
be enormously costly.  We have observed machines demonstrating the symptoms
of thrash death during graceful restart, apparently because of this effect.

I propose the following alternative implementation:

  - On server startup, read all txt (and rnd) maps into memory in their
entirety, recording the mtime of the map file.  This should be done in
post_config, so that the cached data is available to all child
processes.

  - On map lookup:

  - Determine the mtime of the map file

  - If the cache is up-to-date:

  - Acquire a thread rwlock L for this file's cache, in shared mode

  - Read the value from the cache

  - Release L

  - If the cache is outdated, attempt to acquire a thread mutex M for this
file, without blocking

  - If M was acquired, it's this thread's responsibility to refresh the
cache:

  - Read the entire file into a new hash

  - Acquire L in exclusive mode

  - Replace the existing cache with the hash just read

  - Release L

  - Release M

  - If M wasn't acquired, some other thread must be busy refreshing the
cache for this file, so read from the existing cache as if it were
up-to-date, using L as normal

I have convinced myself that this scheme is thread-safe, and that where
possible it avoids serializing threads' access to the cache.  It should
scale well to servers with long-lived processes that use many large maps: 

Re: mod_rewrite performance proposal

2006-06-10 Thread Paul Querna

Aaron Crane wrote:

I've noticed a performance issue on a large site that makes heavy use of txt
RewriteMaps.  I'd like to propose an alternative implementation of txt maps
to deal with the issue.



I think most of the proposal makes sense, but have you considered using 
DBM Files for the RewriteMaps?


-Paul


Re: mod_rewrite performance proposal

2006-06-10 Thread Nick Kew
On Saturday 10 June 2006 20:53, Aaron Crane wrote:
 I've noticed a performance issue on a large site that makes heavy use of
 txt RewriteMaps.

El Reg, by any chance[1]?

 I'd like to propose an alternative implementation of txt 
 maps to deal with the issue.

What you propose makes sense if we accept current behaviour
is a problem.  Indeed, it's arguably OTT to look up anything
per-request if you're cacheing in-memory.

But it begs the question: if lookup is a performance issue,
wouldn't a switch to DBM Rewritemaps be the obvious fix?

 I'd welcome any feedback anyone could offer on this proposal.  I'm happy to
 do the work myself; is there a reasonable chance something along these
 lines would be accepted if I do?  If not, what changes would be needed to
 make this suitable for httpd?

Speaking from a position of ignorance on the workings of RewriteMap,
I think I like it.  And if you're offering to do the work, it sounds like
a good offer to me.  So that's a cautious +1.

[1] a guess based on your talk at last year's ApacheCon.

-- 
Nick Kew


Re: mod_rewrite performance proposal

2006-06-10 Thread Rich Bowen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Aaron Crane wrote:
 I've noticed a performance issue on a large site that makes heavy use of txt
 RewriteMaps.  I'd like to propose an alternative implementation of txt maps
 to deal with the issue.

It sounds like everybody wins. Yes, folks should switch to dbm, but many
won't. And if you are going to implement this, then I say go for it. +1.

- --
Rich Bowen
[EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEiy//XP03+sx4yJMRArp1AKDWbQwtTz7pbgc+jSyVmOtftJN0lACg1S/l
T+1X6dRxSMCF/IfueYtWg6w=
=mXM+
-END PGP SIGNATURE-