Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2

2005-11-07 Thread Mike Looijmans
What i don't like at all in this implementation is the large amount of 
memcpy operations.

1. line.strip()
2. line[:-x]
3. previous_delimiter + ...
The average pass will perform between two and three memcopy operations 
on the read block.


Suggestion: Loose the strip() call - it serves no purpose (the boundary 
will be the only thing on that line, otherwise it's not a boundary)


I've built one without any memcpy whatsoever, I've tried it with the 
test harnass and it appears equal to the Alexis version (pass all except 
the generate_split_boundary_file test):


def read_to_boundary(self, req, boundary, file, readBlockSize=65536):
prevline = ""
while 1:
line = req.readline(readBlockSize)
if not line or line.startswith(boundary):
if prevline.endswith('\r\n'):
file.write(prevline[:-2])
elif prevline.endswith('\n'):
file.write(prevline[:-1])
break
file.write(prevline)
prevline = line






Alexis Marrero wrote:
> New version of read_to_boundary(...)
>
> readBlockSize = 1 << 16
> def read_to_boundary(self, req, boundary, file):
> previous_delimiter = ''
> while 1:
> line = req.readline(readBlockSize)
> if line.strip().startswith(boundary):
> break
>
> if line.endswith('\r\n'):
> file.write(previous_delimiter + line[:-2])
> previous_delimiter = '\r\n'
>
> elif line.endswith('\r'):
> file.write(previous_delimiter + line[:-1])
> previous_delimiter = '\r'
>
> elif line.endswith('\n'):
> if len(line[:-1]) > 0:
> file.write(previous_delimiter + line[:-1])
> previous_delimiter = '\n'
>
> else:
> previous_delimiter += '\n'
>
> else:
> file.write(previous_delimiter + line)
> previous_delimiter = ''
>
> This new functions passes the test for Jim's filetest generator.


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2

2005-11-08 Thread Mike Looijmans
Here's one that passes all the tests, and is 2x as fast as the 'current' 
and 'new' implementations on random binary data. I haven't been able to 
generate data where the 'mike' version is slower:


def read_to_boundary(self, req, boundary, file, readBlockSize=65536):
prevline = ""
last_bound = boundary + '--'
carry = None
while 1:
line = req.readline(readBlockSize)
if not line or line.startswith(boundary):
if prevline.endswith('\r\n'):
if carry is not None:
file.write(carry)
file.write(prevline[:-2])
break
elif (carry == '\r') and (prevline[-1] == '\n'):
file.write(prevline[:-1])
break
# If we get here, it's not really a boundary!
if carry is not None:
file.write(carry)
carry = None
if prevline[-1:] == '\r':
file.write(prevline[:-1])
carry = '\r'
else:
file.write(prevline)
prevline = line

I've attached a modified upload_test_harness.py that includes the new 
and current, also the 'org' version (as in 3.1 release) and the 'mike' 
version.


In addition, I added some profiling calls to show the impact of the 
extra 'endswith' and slices.


--
Mike Looijmans
Philips Natlab / Topic Automation
#!/usr/bin/env python

import sys
from cStringIO import StringIO
import md5

##def generate_split_file(offset=-1,
##  readBlockSize=65368,
##  fname='testfile'):
##f = open(fname, 'wb')
##f.write('a'*50)
##f.write('\r\n')
##block_size =  readBlockSize + offset
##f.write('b'*block_size)
##f.close() 

def read_to_boundary_current(self, req, boundary, file, readBlockSize):
''' currrent version '''
#
# Although technically possible for the boundary to be split by the read, this will
# not happen because the readBlockSize is set quite high - far longer than any boundary line
# will ever contain.
#
# lastCharCarried is used to detect the situation where the \r\n is split across the end of
# a read block.
#
delim = ''
lastCharCarried = False
last_bound = boundary + '--'
roughBoundaryLength = len(last_bound) + 128
line = req.readline(readBlockSize)
lineLength = len(line)
if lineLength < roughBoundaryLength:
sline = line.strip()
else:
sline = ''
while lineLength > 0 and sline != boundary and sline != last_bound:
if not lastCharCarried:
file.write(delim)
delim = ''
else:
lastCharCarried = False
cutLength = 0

if lineLength == readBlockSize:
if line[-1:] == '\r':
delim = '\r'
cutLength = -1
lastCharCarried = True

if line[-2:] == '\r\n':
delim += '\r\n'
cutLength = -2
elif line[-1:] == '\n':
delim += '\n'
cutLength = -1
if cutLength != 0:
file.write(line[:cutLength])
else:
file.write(line)

line = req.readline(readBlockSize)
lineLength = len(line)
if lineLength < roughBoundaryLength:
sline = line.strip()
else:
sline = ''

def read_to_boundary_new(self, req, boundary, file, readBlockSize):
''' Alexis' version
read from the request object line by line with a maximum size,
until the new line starts with boundary
'''
previous_delimiter = ''
while 1:
line = req.readline(readBlockSize)
if line.startswith(boundary):
break

if line.endswith('\r\n'):
file.write(previous_delimiter + line[:-2])
previous_delimiter = '\r\n'

elif line.endswith('\r') or line.endswith('\n'):
file.write(previous_delimiter + line[:-1])
previous_delimiter = line[-1:]

else:
file.write(previous_delimiter + line)
previous_delimiter = ''

def read_to_boundary_org(self, req, boundary, file, readBlockSize):
delim = ""
line = req.readline(readBlockSize)
while line and not line.startswith(boundary):
odelim = delim
if line[-2:] == "\r\n":
delim = "\r\n"
line = line[:-2]
elif line[-1:] == "\n":
delim = "\n"
line = line[:-1]
else:
d

Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2

2005-11-08 Thread Mike Looijmans

Alexis Marrero wrote:
The next test that I will run this against will be with an obscene  
amount of data for which this improvement helps a lot!


The dumb thing is the checking for boundaries.

I'm using http "chunked" encoding to access a raw TAPE device through 
HTTP with python (it GETs or POSTs the raw data as body, each chunk 
coresponds to a tape block). It blazes the data through at the max 
network speed with hardly any CPU usage. This HTTP upload code uses 100% 
CPU while running on my 3GHz box.


The looking-for-line-ends and mime boundaries method is very inefficient 
compared to that. They oughta have put a "content-length" into every 
chunk header, and we wouldn't have had this problem in the first place.


I think the only realistic way to improve performance is to read the 
client input in binary chunks, and then looking for '\r\n---boundary' 
strings in the chunk using standard string functions. Most of the CPU 
time is now spent in the readline() call.


This also means revising all the mime body parsing to cope with that... 
I doubt if that will be worth the effort for anyone.




More efficient FieldStorage, StringField and Field classes

2005-11-11 Thread Mike Looijmans
Inspired by our FieldStorage work with file uploads, I also made some 
implementation changes to Field and StringField in util.py. In essense, 
that will make things like using form['key'] multiple times much more 
efficient. Also, FieldStorage will return the same object if called twice.


I'd like some feedback from the group on these changes, and also some 
directions on what I'd need to do to get them integrated in the 
mod_python beta. I have no experience whatsoever in contributing to open 
source projects.


Major change is the Field and FieldStorage implementation:

class Field:

filename = None
headers = {}

def __init__(self, name):
self.name = name

def __repr__(self):
"""Return printable representation."""
return "Field(%s, %s)" % (`self.name`, `self.value`)

def __getattr__(self, name):
if name != 'value':
raise AttributeError, name
if self.file:
self.file.seek(0)
self.value = self.file.read()
self.file.seek(0)
else:
self.value = None
return self.value

def __del__(self):
self.file.close()

class StringField(str):
""" This class is basically a string with
added attributes for compatibility with std lib cgi.py. Basically, this
works the opposite of Field, as it stores its data in a string, but 
creates
a file on demand. Field creates a value on demand and stores data 
in a file.

"""
filename = None
headers = {}
ctype = "text/plain"
type_options = {}
disposition = None
disp_options = None

# I wanted __init__(name, value) but that does not work 
(apparently, you

# cannot subclass str with a constructor that takes >1 argument)
def __init__(self, value):
'''Create StringField instance. You'll have to set name 
yourself.'''

str.__init__(self, value)
self.value = value

def __getattr__(self, name):
if name != 'file':
raise AttributeError, name
self.file = cStringIO.StringIO(self.value)
return self.file

I've attached a working util.py (based on 3.1.4, so there are also the 
changes to allow large uploads in the file).


 #
 # Copyright 2004 Apache Software Foundation 
 # 
 # Licensed under the Apache License, Version 2.0 (the "License"); you
 # may not use this file except in compliance with the License.  You
 # may obtain a copy of the License at
 #
 #  http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 # implied.  See the License for the specific language governing
 # permissions and limitations under the License.
 #
 # Originally developed by Gregory Trubetskoy.
 #
 # $Id: util.py 102649 2004-02-16 19:47:28Z grisha $

import _apache
from mod_python import apache
import cStringIO
import tempfile

from types import *
from exceptions import *

parse_qs = _apache.parse_qs
parse_qsl = _apache.parse_qsl

""" The classes below are a (almost) a drop-in replacement for the
standard cgi.py FieldStorage class. They should have pretty much the
same functionality.

These classes differ in that unlike cgi.FieldStorage, they are not
recursive. The class FieldStorage contains a list of instances of
Field class. Field class is incapable of storing anything in it.

These objects should be considerably faster than the ones in cgi.py
because they do not expect CGI environment, and are
optimized specifically for Apache and mod_python.
"""

class Field:

filename = None
headers = {}

def __init__(self, name):
self.name = name

def __repr__(self):
"""Return printable representation."""
return "Field(%s, %s)" % (`self.name`, `self.value`)

def __getattr__(self, name):
if name != 'value':
raise AttributeError, name
if self.file:
self.file.seek(0)
self.value = self.file.read()
self.file.seek(0)
else:
self.value = None
return self.value

def __del__(self):
self.file.close()

class StringField(str):
""" This class is basically a string with
added attributes for compatibility with std lib cgi.py. Basically, this
works the opposite of Field, as it stores its data in a string, but creates
a file on demand. Field creates a value on demand and stores data in a file.
"""
filename = None
headers = {}
ctype = "text/plain"
type_options = {}
disposition = None
disp_options = None

# I wanted __init__(name, value) but that does not work (apparently, you
# cannot subclass str with a constructor that takes >1 argument)
def __init__(self, value):
'''Create StringField instance. You'll have to set name yourself.'''
str

Re: Developer Guidelines (was Re: More efficient FieldStorage, StringField and Field classes)

2005-11-15 Thread Mike Looijmans

Ok, following Jim's advice, an having read his guide, I have so far:
- Fetched the trunk version with anonymous SVN (no problem)

I've already installed apache 2.0.55 and Python 2.4.1 on this Windows 
machine (the lazy way, with MSI packages).


I've also installed the latest Cygwin (all "default" packages).

However, I'm not getting anywhere in compiling mod_python. It requires 
the apxs tool which is apparently not part of any Apache windows 
distribution. So I grabbed the apache 2.0.55 source for windows, but the 
configure script is now stuck for over half an hour on:


Configuring Apache Portable Runtime Utility library...

checking for APR-util... reconfig
configuring package in srclib/apr-util now

Nothing appears to happen.

Anyone here who can help me out getting this up and running on Windows XP?

I can also use linux machines for compile and tests (I'm not root on 
any), but I do most development on this Windows system, so I need a 
windows executable anyway.


PS: Haven't found any speling errors in your document...

Jim Gallacher wrote:

Mike Looijmans wrote:

Inspired by our FieldStorage work with file uploads, I also made some 
implementation changes to Field and StringField in util.py. In 
essense, that will make things like using form['key'] multiple times 
much more efficient. Also, FieldStorage will return the same object if 
called twice.


I'd like some feedback from the group on these changes, and also some 
directions on what I'd need to do to get them integrated in the 
mod_python beta. I have no experience whatsoever in contributing to 
open source projects.



I think it's too late in the beta cycle to be rewriting code unless it's 
to fix a specific, critical bug. We really, really, *really* need to 
deliver 3.2 final, and I don't want to mess with working code at this 
point. The changes you are suggesting are likely more appropriate for 
3.3, or possibly a 3.2 bugfix release.


As far as contributing to mod_python, well, I think you already have the 
right idea: propose changes; discuss on python-dev; seek consensus; wait 
for your changes to be committed by myself or Nicolas. Bug fixes and 
code improvements are much more likely to be accepted than major new 
features. We want mod_python to be tightly focused on the core 
functionality of providing a python interface to apache. The fancy bells 
and whistles are best left to frameworks that utilize mod_python further 
up the application stack.


For your specific changes, I would strongly suggest that you make your
changes against the mod_python development branch, ie trunk. Also people
are much more likely to test your changes if you submit a patch rather
than offering a completely new file. This makes it easier to see what 
code is being changed and avoids problems with regressions.


I've put together a "Developer's Guide" draft that I'd like to 
eventually see included in either the documentation or on the website. 
This document doesn't represent any offical position of the mod_python 
developers, but rather just what I see as best practices and some 
resources for contributors. Everyone should feel free to offer 
suggestions to improve this guide. The original is in svn at:
http://svn.apache.org/repos/asf/httpd/mod_python/branches/jgallacher/docs/developers-guide.rst 



And no, I have not spell checked it yet. :)


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: Developer Guidelines (was Re: More efficient FieldStorage, StringField and Field classes)

2005-11-15 Thread Mike Looijmans

David Fraser wrote:

> See my other mail - basically you do
> cd dist
> set APACHESRC=c:\path\to\apache  (where this is the Apache2 dir in a
> standard installation, now need for apache source)
> build_installer

Okay, first I downloaded the .NET SDK. (Our internet connection is great 
here)


Then I get:

error: Python was built with version 7.1 of Visual Studio, and 
extensions need to be built with the same version of the compiler, but 
it isn't installed.


I don't have a VS license - that would mean I have to build my own 
Python in order to get any further. Oh dear...


I was wondering - since i won't be hackin' the C code (yet), would it be 
sufficient to just install the binary, and use the .py files as obtained 
from the SVN repository?


The 3.2.5 beta seems to run just fine here. Cannot run the automated 
tests though, because of the issues mentioned above.


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: More efficient StringField and Field classes

2005-11-24 Thread Mike Looijmans

It has been awhile, but I finally made the patch.
Advice and feedback is welcome.

What it does:
- Simplifies the creation of StringField objects. This was already
marked as a TODO in the 3.2.5b code.
- Does not create a file object (cStringIO) for each and every field.
- FieldStorage.get() will always return the same instance(s) for any
given name.
- FieldStorage.get() is very cheap now (does not create new objects,
which is expensive in Python)

It may break:
- Code that creates an instance of Field or StringField directly. This
can be fixed by extending the __init__() routine, but I don't think it's
needed.

It works:
- Perfect on all of my projects.

Attached:
- Patch created by SVN

--
Mike Looijmans
Philips Natlab / Topic Automation
Index: util.py
===
--- util.py (revision 348746)
+++ util.py (working copy)
@@ -48,19 +48,8 @@
 """
 
 class Field:
-   def __init__(self, name, file, ctype, type_options,
-disp, disp_options, headers = {}):
+   def __init__(self, name):
self.name = name
-   self.file = file
-   self.type = ctype
-   self.type_options = type_options
-   self.disposition = disp
-   self.disposition_options = disp_options
-   if disp_options.has_key("filename"):
-   self.filename = disp_options["filename"]
-   else:
-   self.filename = None
-   self.headers = headers
 
def __repr__(self):
"""Return printable representation."""
@@ -81,13 +70,30 @@
self.file.close()
 
 class StringField(str):
-   """ This class is basically a string with
-   a value attribute for compatibility with std lib cgi.py
-   """
+""" This class is basically a string with
+added attributes for compatibility with std lib cgi.py. Basically, this
+works the opposite of Field, as it stores its data in a string, but creates
+a file on demand. Field creates a value on demand and stores data in a 
file.
+"""
+filename = None
+headers = {}
+ctype = "text/plain"
+type_options = {}
+disposition = None
+disp_options = None
+
+# I wanted __init__(name, value) but that does not work (apparently, you
+# cannot subclass str with a constructor that takes >1 argument)
+def __init__(self, value):
+'''Create StringField instance. You'll have to set name yourself.'''
+str.__init__(self, value)
+self.value = value
 
-   def __init__(self, str=""):
-   str.__init__(self, str)
-   self.value = self.__str__()
+def __getattr__(self, name):
+if name != 'file':
+raise AttributeError, name
+self.file = cStringIO.StringIO(self.value)
+return self.file
 
 class FieldStorage:
 
@@ -103,8 +109,7 @@
if req.args:
pairs = parse_qsl(req.args, keep_blank_values)
for pair in pairs:
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
 
if req.method != "POST":
return
@@ -123,9 +128,7 @@
if ctype.startswith("application/x-www-form-urlencoded"):
pairs = parse_qsl(req.read(clen), keep_blank_values)
for pair in pairs:
-   # TODO : isn't this a bit heavyweight just for form fields ?
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
return
 
if not ctype.startswith("multipart/"):
@@ -205,33 +208,44 @@
else:
name = None
 
+   # create a file object
# is this a file?
if disp_options.has_key("filename"):
if file_callback and callable(file_callback):
file = file_callback(disp_options["filename"])
else:
-   file = self.make_file()
+   file = tempfile.TemporaryFile("w+b")
else:
if field_callback and callable(field_callback):
file = field_callback()
else:
-   file = self.make_field()
+   file = cStringIO.StringIO()
 
# read it in
-   end_of_stream = self.read_to_boundary(req, boundary, file)
+   self.read_to_boundary(req, boundary, file)
file.seek(0)
-
+ 
# make a Field
-   field = Field(name, file, ctype, type_options, disp, disp_options, 
headers)
-
+   if disp_options.has_key("filename"):
+  

Re: More efficient StringField and Field classes

2005-11-25 Thread Mike Looijmans
Is there a way that I can run the unit tests on a Windows systems? I 
have plenty of Linux boxes here, but I'd have to build and install 
apache in a "private" corner just to run the tests.


Anyway, I see where the problem is, in tests.py it does:

def util_fieldstorage(req):
from mod_python import util
req.write(`util.FieldStorage(req).list`)
return apache.OK

So I'll just add a __repr__ function to the StringField class and the 
test should pass then.


I have attached a patch (on the 3.2.5b version of util.py) to include 
this change. That should fix the unit test, and it did not break any of 
my own code.



Jim Gallacher wrote:

Hi Mike,

I don't have time to dig into this tonight, but your patch causes one of 
the unit tests to fail.


FAIL: test_util_fieldstorage (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 1117, in test_util_fieldstorage
self.fail(`rsp`)
  File "/usr/lib/python2.3/unittest.py", line 270, in fail
raise self.failureException, msg
AssertionError: "['1', '2', '3', '4']"


Jim




--
Mike Looijmans
Philips Natlab / Topic Automation
Index: util.py
===
--- util.py (revision 348746)
+++ util.py (working copy)
@@ -48,19 +48,8 @@
 """
 
 class Field:
-   def __init__(self, name, file, ctype, type_options,
-disp, disp_options, headers = {}):
+   def __init__(self, name):
self.name = name
-   self.file = file
-   self.type = ctype
-   self.type_options = type_options
-   self.disposition = disp
-   self.disposition_options = disp_options
-   if disp_options.has_key("filename"):
-   self.filename = disp_options["filename"]
-   else:
-   self.filename = None
-   self.headers = headers
 
def __repr__(self):
"""Return printable representation."""
@@ -81,13 +70,34 @@
self.file.close()
 
 class StringField(str):
-   """ This class is basically a string with
-   a value attribute for compatibility with std lib cgi.py
-   """
+""" This class is basically a string with
+added attributes for compatibility with std lib cgi.py. Basically, this
+works the opposite of Field, as it stores its data in a string, but creates
+a file on demand. Field creates a value on demand and stores data in a 
file.
+"""
+filename = None
+headers = {}
+ctype = "text/plain"
+type_options = {}
+disposition = None
+disp_options = None
+
+# I wanted __init__(name, value) but that does not work (apparently, you
+# cannot subclass str with a constructor that takes >1 argument)
+def __init__(self, value):
+'''Create StringField instance. You'll have to set name yourself.'''
+str.__init__(self, value)
+self.value = value
 
-   def __init__(self, str=""):
-   str.__init__(self, str)
-   self.value = self.__str__()
+def __getattr__(self, name):
+if name != 'file':
+raise AttributeError, name
+self.file = cStringIO.StringIO(self.value)
+return self.file
+
+def __repr__(self):
+   """Return printable representation (to pass unit tests)."""
+   return "Field(%s, %s)" % (`self.name`, `self.value`)
 
 class FieldStorage:
 
@@ -103,8 +113,7 @@
if req.args:
pairs = parse_qsl(req.args, keep_blank_values)
for pair in pairs:
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
 
if req.method != "POST":
return
@@ -123,9 +132,7 @@
if ctype.startswith("application/x-www-form-urlencoded"):
pairs = parse_qsl(req.read(clen), keep_blank_values)
for pair in pairs:
-   # TODO : isn't this a bit heavyweight just for form fields ?
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
return
 
if not ctype.startswith("multipart/"):
@@ -205,33 +212,44 @@
else:
name = None
 
+   # create a file object
# is this a file?
if disp_options.has_key("filename"):
if file_callback and callable(file_callback):
file = file_callback(disp_options["file

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-28 Thread Mike Looijmans

Nicolas Lehuen wrote:

Why is the ordering so important ? I do understand we need to support
multiple values per field name, but I don't see why ordering is
needed.


Because there are applications out there that will break if you change 
it. Besides that, think about a form with a few text entry boxes (all 
the same name, e.g. "spouse"). It would be very confusing for a user of 
that page to see the text re-ordered every time he clicks one of the 
buttons on the page. (I'm perfectly aware of at least 4 alternatives, 
but that is not my point here).


From the page code I've written and seen so far, the order of 
differently named fields is not important. I haven't seen a case where
a form expecting a=1&b=2 would fail if you pass it b=2&a=1. But I have 
seen cases where a=1&x=2&x=3 is not the same as a=1&x=3&x=2. The simple 
dictionary implementation as proposed would not break that code.



--
Mike Looijmans
Philips Natlab / Topic Automation



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-28 Thread Mike Looijmans

Nicolas Lehuen wrote:

One proposal was to use an OrderedHashtable. As you can see, the reply
was quite definite :

"No. There is nothing in the spec that says that these parameters should be
 in any sort of order. CGI scripts that expect them to be in order are coded
 incorrectly IMHO."


Standing on a soap box preaching like that may work in the Java world, 
but we Python programmers have a more realistic view of the world. If we 
don't want to be crushed by a stampede of angry web developers, I guess 
we should, at least, provide the means to:

- Iterate through the arguments "as listed".
- Get multi-fields into an array in the order as listed.
- Provide fast, dictionary-like access.

Hey! this is Python. We can do all that, without losing anything. Let 
them poor Java folks preach about hell and doom, while we just get on 
with our lives and build something that meets everybody's expectations.
How about we make the first call to get or __getitem__ create the 
dictionary? We could put code in __getattr__ to create it when it's 
referenced.


Patch is on its way...

--
Mike Looijmans
Philips Natlab / Topic Automation



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Mike Looijmans
How about we make the first call to get or __getitem__ create the 
dictionary? We could put code in __getattr__ to create it when it's 
referenced.


Here is the patch to util.py to use a dictionary-on-demand.
Lots of code removed for this one, and a few lines added.

There's also a brand new items() function, which, contrary to common 
belief, preserves argument ordering.


Note: This also includes my previous StringField patch, since that is 
crucial to make the implementation as simple as it is now.


Anxiously awaiting your comments and benchmark/profiler results!

['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks']
--
Mike Looijmans
Philips Natlab / Topic Automation
Index: util.py
===
--- util.py (revision 348746)
+++ util.py (working copy)
@@ -48,19 +48,8 @@
 """
 
 class Field:
-   def __init__(self, name, file, ctype, type_options,
-disp, disp_options, headers = {}):
+   def __init__(self, name):
self.name = name
-   self.file = file
-   self.type = ctype
-   self.type_options = type_options
-   self.disposition = disp
-   self.disposition_options = disp_options
-   if disp_options.has_key("filename"):
-   self.filename = disp_options["filename"]
-   else:
-   self.filename = None
-   self.headers = headers
 
def __repr__(self):
"""Return printable representation."""
@@ -81,13 +70,34 @@
self.file.close()
 
 class StringField(str):
-   """ This class is basically a string with
-   a value attribute for compatibility with std lib cgi.py
-   """
+""" This class is basically a string with
+added attributes for compatibility with std lib cgi.py. Basically, this
+works the opposite of Field, as it stores its data in a string, but creates
+a file on demand. Field creates a value on demand and stores data in a 
file.
+"""
+filename = None
+headers = {}
+ctype = "text/plain"
+type_options = {}
+disposition = None
+disp_options = None
+
+# I wanted __init__(name, value) but that does not work (apparently, you
+# cannot subclass str with a constructor that takes >1 argument)
+def __init__(self, value):
+'''Create StringField instance. You'll have to set name yourself.'''
+str.__init__(self, value)
+self.value = value
 
-   def __init__(self, str=""):
-   str.__init__(self, str)
-   self.value = self.__str__()
+def __getattr__(self, name):
+if name != 'file':
+raise AttributeError, name
+self.file = cStringIO.StringIO(self.value)
+return self.file
+
+def __repr__(self):
+   """Return printable representation (to pass unit tests)."""
+   return "Field(%s, %s)" % (`self.name`, `self.value`)
 
 class FieldStorage:
 
@@ -103,8 +113,7 @@
if req.args:
pairs = parse_qsl(req.args, keep_blank_values)
for pair in pairs:
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
 
if req.method != "POST":
return
@@ -123,9 +132,7 @@
if ctype.startswith("application/x-www-form-urlencoded"):
pairs = parse_qsl(req.read(clen), keep_blank_values)
for pair in pairs:
-   # TODO : isn't this a bit heavyweight just for form fields ?
-   file = cStringIO.StringIO(pair[1])
-   self.list.append(Field(pair[0], file, "text/plain", {}, None, 
{}))
+   self.add_field(pair[0], pair[1])
return
 
if not ctype.startswith("multipart/"):
@@ -205,33 +212,44 @@
else:
name = None
 
+   # create a file object
# is this a file?
if disp_options.has_key("filename"):
if file_callback and callable(file_callback):
file = file_callback(disp_options["filename"])
else:
-   file = self.make_file()
+   file = tempfile.TemporaryFile("w+b")
else:
if field_callback and callable(field_callback):
file = field_callback()
else:
-   file = self.make_field()
+   file = cStringIO.StringIO()
 
# read it in
-   end_of_stream = self.read_to_boundary(req, boundary, file)
+   self.read_to_boundary(req, boundary, file)
f

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Mike Looijmans
Just one comment.  It seems like it would be better just to make 
add_method inline, since everything else in __init__ is, and it 
never gets called from anywhere else.


s/_method/_field/g

The thing I had in mind when I built the add_field method was that I 
could subclass FieldStorage and give the page handler more control over 
what is happening. This was convenient for the thing that got this all 
started: Posting large files. In fact, I intended add_field to replace 
the make_file function.


What I had in mind is that you override the add_field function to get 
context information (e.g. user name, upload directory and such) before 
the file upload comes in, so that you can put the upload where it belongs.


With the two callbacks, you can create the same effect, so it was no 
longer nessecary.


Also, the add_field is referred at least twice and I don't like 
duplicating code.


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-11-29 Thread Mike Looijmans
I am still not convinced we need an index. I'd like to see some 
concrete proof that we're not engaging in "overoptimization" here - is 
this really a bottleneck for anyone?


Well, I need to get some real work done today, but I will do some 
benchmarking at some point. It's not like there's a big rush to complete 
3.3.0 this week. I do like that Mike's code simplifies a number of the 
the FS methods. It just looks more elegant.


"More elegant" is the best reason I can think of for refactoring it. 
Heck, even if my implementation can be proved to be slower than the 
original, I still like this one better because of sheer code size and 
readability.


I don't think that you'll notice much of a performance difference, but I 
like the fact that it:

- You get the same object instance:
assert (form['key'] is form['key'])
- Does not create a file object for every field
- Getting form['key'] a second time is cheap.

--
Mike Looijmans
Philips Natlab / Topic Automation



Re: Various musings about the request URL / URI / whatever

2005-11-29 Thread Mike Looijmans

Daniel J. Popowich wrote:


By the Host header. I've been looking into this issue tonight and
think I have the answers (but it's really late, so I'll save the gory
details for tomorrow). In brief: typically, req.hostname is set from
the Host header and, in fact, when I telnet to apache and issue a GET
by hand, if I don't send the Host header, apache barfs with a 400, Bad
Request, response. (apache 2.0.54, debian testing)


It will only do that if you claim to be a HTTP/1.1 client. If you send 
GET / HTTP/1.0

it will not complain about the host header. Sending:
GET / HTTP/1.1
will get you a 400 response, because you MUST supply it (says RFC 2068, 
and whatever superseded that one). There is more you must do to be able 
to call yourself HTTP/1.1 by the way, such as keep-alive connections and 
chunked encoding.


--
Mike Looijmans
Philips Natlab / Topic Automation



mod_python 3.2.6 testing on debian/colinux

2006-01-18 Thread Mike Looijmans
Since I have a CoLinux instance on my machine here, I wanted to give it 
a go with mod_python as well (need a linux test environment for 
mod_python anyway).


It's running a Debian distro, I have gcc-3.3 on it, as well as 
apache2-dev and python-dev (version 2.3).


Can't get the tests to run though, even though Apache itself appears to 
run okay (I can get the manual pages from it).


Here's an example session (running as root, also tried normal user):



colinux:/home/mike/mod_python/test# python test.py

* Testing LoadModule
  Creating config
listen port: 32843
  Starting Apache
 /usr/sbin/apache2  -k start -f 
/home/mike/mod_python/test/conf/test.conf

F  Stopping Apache...
 /usr/sbin/apache2 -k stop -f /home/mike/mod_python/test/conf/test.conf
httpd (pid 4940?) not running
Traceback (most recent call last):
  File "test.py", line 2033, in ?
tr.run(suite())
  File "/usr/lib/python2.3/unittest.py", line 658, in run
test(result)
  File "/usr/lib/python2.3/unittest.py", line 389, in __call__
test(result)
  File "/usr/lib/python2.3/unittest.py", line 239, in __call__
self.tearDown()
  File "test.py", line 1812, in tearDown
self.stopHttpd()
  File "test.py", line 331, in stopHttpd
    time.sleep(1)
KeyboardInterrupt
colinux:/home/mike/mod_python/test#

Mike Looijmans
Philips Natlab / Topic Automation




Nicolas Lehuen wrote:

Mike, I forward your +1 to python-dev since you only sent it to me.

Regards,
Nicolas

-- Forwarded message --
From: Mike Looijmans <[EMAIL PROTECTED]>
Date: 18 janv. 2006 08:21
Subject: Re: mod_python 3.2.6 (Final!) available for testing
To: [EMAIL PROTECTED]


+1

Windows XP Pro SP2
Apache 2.0.54
Python 2.4.2

--
Mike Looijmans
Philips Natlab / Topic Automation


Nicolas Lehuen wrote:


You can fetch the Win32 version for Python 2.3 and Python 2.4 here :

http://nicolas.lehuen.com/download/mod_python/







Re: please set up a mod_python core group

2006-01-18 Thread Mike Looijmans

+1

:o)

Seriously, I think Grisha's way is right - the three musketeers should 
decide based on the feedback they get. There's no substitute for running 
on other people's machines...


--
Mike Looijmans
Philips Natlab / Topic Automation


Nicolas Lehuen wrote:

Hi,

It's OK for me !

Regards,
Nicolas

2006/1/19, Gregory (Grisha) Trubetskoy <[EMAIL PROTECTED]>:


Thanks Roy. Very timely, since 3.2.6 is (so far) going to be a
final/stable release.

I propose that for starters those people are:

me (I'm also in the Apache HTTP Server PMC)
Jim Gallacher
Nicolas Lehuen
Graham Dumpleton




Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Mike Looijmans

Two comments:
1. (bug): The acquire() call should be *outside* the try ... finally 
block. You do not want to release a lock that you did not aquire.


2. (optimization): If you are not planning to change the path, you do 
not have to aquire the lock. Aquiring a lock is quite expensive, so the 
double-check will have much less impact on performance.



if config.has_key("PythonPath"):
pathstring = config["PythonPath"]
if not _path_cache.has_key(pathstring):
# acquire must be done outside the try block
_path_cache_lock.acquire()
try:
# double-check, because two threads might come
# to the same conclusion up until here.
if not _path_cache.has_key(pathstring):
newpath = eval(pathstring)
_path_cache[pathstring] = None
sys.path[:] = newpath
finally:
_path_cache_lock.release()



Mike Looijmans
Philips Natlab / Topic Automation


Graham Dumpleton (JIRA) wrote:

Problems with PythonPath directive.
---

 Key: MODPYTHON-114
 URL: http://issues.apache.org/jira/browse/MODPYTHON-114
 Project: mod_python
Type: Bug
  Components: core  
Versions: 3.2, 3.1.4
Reporter: Graham Dumpleton



The "PythonPath" setting can be used to define the value of the Python 
"sys.path" variable. It is this variable which defines the list of directories that 
Python will search in when looking for a module to be imported.

Although the actual reassignment of "sys.path" by mod_python does not in itself 
present a problem due to assignment in Python being thread safe by definition, the 
context in which the assignment occurs is not thread safe and a race condition exists.

This exists as the top level mod_python dispatcher will consult the existing value of "sys.path" and the last 
value for the "PythonPath" setting encountered before then making a decision to modify "sys.path". 
If multiple requests are being serviced in distinct threads within the context of the same interpreter instance, and 
each at the same time decide they want to modify the value of "sys.path", only one might ultimately succeed 
in setting it to the value it wants and any modification required by the other may be lost.

In the worst case scenario, this can result in the importation of any subsequent modules within 
that request failing due to a required directory not being present in "sys.path". It is 
possible that this situation may resolve itself and go away on a subsequent request, but due to how 
mod_python caches the last value of "PythonPath" in a global variable this will be 
dependent on what other requests arrive.

At the least, for mod_python to resolve the problem itself would require a request to arrive in the interim which targeted the URL which 
was not the last to cache its raw setting for "PythonPath". This only works though due to a further issue whereby alternate 
requests against URLs with different "PythonPath" settings will cause "sys.path" to be extended everytime if the 
"PythonPath" setting references "sys.path". This results in "sys.path" continually growing over time due to 
directories being added multiple times.

The problematic code in apache.py is:

if config.has_key("PythonPath"):
# we want to do as little evaling as possible,
# so we remember the path in un-evaled form and
# compare it
global _path
pathstring = config["PythonPath"]
if pathstring != _path:
_path = pathstring
newpath = eval(pathstring)
if sys.path != newpath:
sys.path[:] = newpath

To fix the problems, the processing of PythonPath directive should be protected 
by a thread mutex lock and a dictionary should be used to store all seen values 
of PythonPath rather than a single variable. If a value for PythonPath has ever 
been seen, and not just from the last change, then no update would be necessary.

if config.has_key("PythonPath"):

try:
_path_cache_lock.acquire()

pathstring = config["PythonPath"]

if not _path_cache.has_key(pathstring):
newpath = eval(pathstring)
_path_cache[pathstring] = None
sys.path[:] = newpath

finally:
_path_cache_lock.release()

There shouldn't really be a need to check whether the new path is different to 
the current value of sys.path as that scenario shouldn't occur at that point.

The two parts of this problem were previously catalogued as ISSUE 15 and ISSUE 
16 in my list of problems with the module importing system. The list can be 
found at:

  http://www.dscpl.com.au/articles/modpython-003.html







Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Mike Looijmans

Wow - I wonder how many programs are leaking memory today because of
this double-check issue. I've been using it myself for years (yes, also 
in Java)...


Having read the article, I've come to the following conclusions:
- Python indeed won't suffer from this problem. But I agree to keep on
the safe side, the optimization is not worth the problems it might cause
later.
- I'll never use it again. (Unless I'm hacking away in assembly and have
to write my own locking and threading code anyway...)


--
Mike Looijmans
Philips Natlab / Topic Automation


Nicolas Lehuen wrote:


Your optimisation is called "double-checked locking", and it is now
known to be broken, especially on multiple-CPU systems. The problem
exists in numerous languages. Here is an explanation of the problem :

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

Now, maybe Python is not touched by this problem due to its relatively
poor support for multithreading. The presence of the GIL guarantees
that only one thread of Python bytecode interpreter runs at a time,
but still, in a multiple CPU environment, you can get bitten by local
caching issues.

As this thing is a bit hairy, I think we should first strive for
correctness, then care about performance later. And no, I won't bother
you with one more quote about premature this and the root of that ;).

Regards,
Nicolas




--
Mike Looijmans
Philips Natlab / Topic Automation



[Fwd: Re: Version 3.3 and beyond .......]

2006-02-02 Thread Mike Looijmans



We probably want to defer until after 3.2.7 (final) is released to have
any serious discussion about what should constitute version 3.3, but
am still curious to know at this point where your interests in 3.3 lie.
Is it simply to help finish up eliminating all these known issues/bugs
or do you have other ideas in mind as to the direction of mod_python?


Well, since you asked...

I'd like to see more HTTP level helper functions. For one thing, I keep
writing the same stuff for handling if-modified-since headers, accept,
and similar construct. Having one library to handle these with would be
very helpful.

For example, a function to format a datetime tuple into a HTTP style
date string (this one should *definitely* go into mod_python), as well
as its counterpart. Note that this implementation is not correct as it
does not handle timezones...

##

weekdayname = ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')

monthname = (None,
 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

def httpdate(dt):
"""Return the current date and time formatted for a
message header."""
now = time.time()
year, month, day, hh, mm, ss, wd, y, z = dt
s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
weekdayname[wd],
day, monthname[month], year,
hh, mm, ss)
return s

##

Another thing I use very often (when writing DB apps) is the
if-modified-since mechanism. This helps reducing the load on the server
since most clients will be visiting the same page a number of times in a
session.
Calling checkLastModified in my code quickly aborts rendering the page
if it hasn't changed since the last visit.
(This one is incomplete as well - should add support for etag and length
extensions. The etag is very useful when things like username can change
between pages.)

##

def checkLastModified(req, lastmod):
"""If lastmod is non-zero (should be datetime object), outputs
a Last-Modified header. If the browser supplied an identical
If-Modified-Since header, this method raises
apache.SERVER_RETURN with apache.HTTP_NOT_MODIFIED
which informs the browser to use its cached version.
"""
if lastmod:
ifmodsince = req.headers_in.get('if-modified-since', None)
strlastmod = httpdate(lastmod.timetuple())
if ifmodsince:
# TODO: We might want to do <= i.s.o. ==
if strlastmod == ifmodsince:
raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED
req.headers_out.add('Last-Modified', strlastmod)

(note to self: Learn to hit "Reply All" instead of just "Reply"
--
Mike Looijmans
Philips Natlab / Topic Automation



Re: Last modified times

2006-02-09 Thread Mike Looijmans
I've read through the documentation. I like the interface - just call 
ap_update_mtime for every object your response depends on (and another 
great feature: It starts with the mtime of the file, so that updates to 
the script also invalidate the cached response).


Exposing those three functions would make a very good addition. A little 
Pythonic wrapping around ap_update_mtime to accept reasonable things 
like datetime objects would be helpful.


ap_discard_request_body() might help resolving the issue with HEAD 
request writing out a body, maybe that change was related to 
http://issues.apache.org/jira/browse/MODPYTHON-71 but I'm not sure.


Mike Looijmans
Philips Natlab / Topic Automation


Graham Dumpleton wrote:

Mike, I have a feeling that Apache has ways of generating those date/time
strings for last modified for you. Thus, the routine shouldn't be required.
The real problem is that mod_python doesn't expose the methods which
may actually be useful to you.

Can you look through the Apache functions described in:

  http://162.105.203.19/apache-doc/138.htm

and indicate whether if functions like:

  ap_set_etag()
  ap_update_mtime()
  ap_set_last_modified()

you could more easily do the same thing.

The request object already provides equivalent to ap_meets_conditions(),
but not these others.

I also find the function:

  ap_discard_request_body()

to be interesting. That might be interesting to have access to as well.

Graham

Mike Looijmans wrote ..


We probably want to defer until after 3.2.7 (final) is released to have
any serious discussion about what should constitute version 3.3, but
am still curious to know at this point where your interests in 3.3 lie.
Is it simply to help finish up eliminating all these known issues/bugs
or do you have other ideas in mind as to the direction of mod_python?


Well, since you asked...

I'd like to see more HTTP level helper functions. For one thing, I keep
writing the same stuff for handling if-modified-since headers, accept,
and similar construct. Having one library to handle these with would be
very helpful.

For example, a function to format a datetime tuple into a HTTP style
date string (this one should *definitely* go into mod_python), as well
as its counterpart. Note that this implementation is not correct as it
does not handle timezones...

##

weekdayname = ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')

monthname = (None,
 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

def httpdate(dt):
"""Return the current date and time formatted for a
message header."""
now = time.time()
year, month, day, hh, mm, ss, wd, y, z = dt
s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
weekdayname[wd],
day, monthname[month], year,
hh, mm, ss)
return s

##

Another thing I use very often (when writing DB apps) is the
if-modified-since mechanism. This helps reducing the load on the server
since most clients will be visiting the same page a number of times in
a
session.
Calling checkLastModified in my code quickly aborts rendering the page
if it hasn't changed since the last visit.
(This one is incomplete as well - should add support for etag and length
extensions. The etag is very useful when things like username can change
between pages.)

##

def checkLastModified(req, lastmod):
"""If lastmod is non-zero (should be datetime object), outputs
a Last-Modified header. If the browser supplied an identical
If-Modified-Since header, this method raises
apache.SERVER_RETURN with apache.HTTP_NOT_MODIFIED
which informs the browser to use its cached version.
"""
if lastmod:
ifmodsince = req.headers_in.get('if-modified-since', None)
strlastmod = httpdate(lastmod.timetuple())
if ifmodsince:
    # TODO: We might want to do <= i.s.o. ==
if strlastmod == ifmodsince:
raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED
req.headers_out.add('Last-Modified', strlastmod)

(note to self: Learn to hit "Reply All" instead of just "Reply"
--
Mike Looijmans
Philips Natlab / Topic Automation




Re: Refactoring of the test suite

2006-02-12 Thread Mike Looijmans

Oh and if we are refactoring the tests, I want a "make tests" rule. I'm
 tired of doing: ./configure; make; sudo make install; make tests; DOH!
cd test; python test.py. :)


Make that "make check" (like autotools), to not confuse old-skool 
autoconfers like myself.




FieldStorage callback only works with FileType objects in 3.2.7

2006-02-19 Thread Mike Looijmans

There is still trouble with the FieldStorage class. Looks like this one is not 
really fixed:
http://issues.apache.org/jira/browse/MODPYTHON-40

See for a counter example:
http://modpython.org/pipermail/mod_python/2006-February/020248.html
(remove the "import fmt")
It fails with the following traceback:

Mod_python error: "PythonHandler mod_python.psp"

Traceback (most recent call last):

  File "C:\Python24\Lib\site-packages\mod_python\apache.py", line 299, in 
HandlerDispatch
result = object(req)

  File "C:\Python24\Lib\site-packages\mod_python\psp.py", line 302, in handler
p.run()

  File "C:\Python24\Lib\site-packages\mod_python\psp.py", line 213, in run
exec code in global_scope

  File "C:/source/archaic_web/upload.psp", line 47, in ?
for afile in frm.getlist('archivefile'):

  File "C:\Python24\Lib\site-packages\mod_python\util.py", line 354, in getlist
found.append(StringField(item.value))

  File "C:\Python24\Lib\site-packages\mod_python\util.py", line 74, in 
__getattr__
value = self.file.read()

AttributeError: FileCounter instance has no attribute 'read'


The file is posted as file, and correctly writting into the FileCounter. However, the FieldStorage 
class forgets about this, and in util.py it tries to determine whether this is a file or not by 
looking whether it derives from FileType. Since any "home made" file object is not likely to derive 
from that class, util.py concludes that this was a simple field, and does a file.read() to fetch it 
into memory (which is what we were preventing in issue 40)


A quick way to solve it is to use this code:
http://issues.apache.org/jira/browse/MODPYTHON-93
this will remove the bad checks for field types alltogether.

--
Mike Looijmans
Philips Natlab / Topic Automation



Re: mod_python 3.2.8 available for testing

2006-02-21 Thread Mike Looijmans

I get failing tests on WinXP-sp2, apache 2.0.54, python 2.4.2.

This may be because the test suite is looking for things not there yet.

Put the 3.2.8 binary distribution on my system and did a svn update to 
get the latest test suite. When running the tests, I got the following 
failures (can we do something about the readability of the output):


==
FAIL: test_req_auth_type (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 661, in test_req_auth_type
self.fail(`rsp`)
AssertionError: '\n\nMod_python error: "PythonAuthenHandler 
tests::req_auth_type"\n\nTraceback (most recent call last):\n\n  File 
"C:\\Python24\\Lib\\site-
packages\\mod_python\\apache.py", line 299, in HandlerDispatch\n 
result = object(req)\n\n  File 
"C:/source/mod_python/test/htdocs/tests.py", line 609, in req
_auth_type\nif req.auth_type() != "dummy":\n\nAttributeError: 
\'mp_request\' object has no attribute \'auth_type\'\n\n\n'


==
FAIL: test_req_construct_url (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 753, in test_req_construct_url
self.fail("construct_url")
AssertionError: construct_url

==
FAIL: test_req_handler (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 1009, in test_req_handler
self.fail(`rsp`)
AssertionError: '\n\nMod_python error: "PythonFixupHandler 
tests::req_handler"\n\nTraceback (most recent call last):\n\n  File 
"C:\\Python24\\Lib\\site-pac
kages\\mod_python\\apache.py", line 299, in HandlerDispatch\nresult 
= object(req)\n\n  File "C:/source/mod_python/test/htdocs/tests.py", 
line 770, in req_ha
ndler\nreq.handler = "mod_python"\n\nTypeError: attribute 
\'handler\' of \'mp_request\' objects is not writable\n\n\n'


==
FAIL: test_trans (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 1402, in test_trans
self.fail(`rsp`)
AssertionError: ' #\r\n # \r\n # Licensed under the Apache License, 
Version 2.0 (the "License"); you\r\n # may not use this file except in 
compliance with the L
icense.  You\r\n # may obtain a copy of the License at\r\n #\r\n # 
http://www.apache.org/licenses/LICENSE-2.0\r\n #\r\n # Unless required 
by applicable law
 or agreed to in writing, software\r\n # distributed under the License 
is distributed on an "AS IS" BASIS,\r\n # WITHOUT WARRANTIES OR 
CONDITIONS OF ANY KIND, e
ither express or\r\n # implied.  See the License for the specific 
language governing\r\n # permissions and limitations under the 
License.\r\n #\r\n #\r\n # $Id:
 tests.py 379638 2006-02-22 00:41:51Z jgallacher $\r\n #\r\n\r\n# 
mod_python tests\r\n\r\nfrom __future__ import generators\r\nfrom 
mod_python.python22 import *
\r\n\r\nfrom mod_python import apache\r\nimport unittest\r\nimport 
re\r\nimport time\r\nimport os\r\nimport cStringIO\r\n\r\n# This is used 
for mod_python.publi
sher security tests\r\n_SECRET_PASSWORD = \'root\'\r\n__ANSWER = 
42\r\n\r\nclass SimpleTestCase(unittest.TestCase):\r\n\r\ndef 
__init__(self, methodName, re
q):\r\nunittest.TestCase.__init__(self, methodName)\r\n 
self.req = req\r\n\r\ndef test_apache_log_error(self):\r\n\r\n 
  s = self.req.se
rver\r\napache.log_error("Testing apache.log_error():", 
apache.APLOG_INFO, s)\r\napache.log_error("xEMERGx", 
apache.APLOG_EMERG, s)\r\na
pache.log_error("xALERTx", apache.APLOG_ALERT, s)\r\n 
apache.log_error("xCRITx", apache.APLOG_CRIT, s)\r\n 
apache.log_error("xERRx", apache.APLOG_
ERR, s)\r\napache.log_error("xWARNINGx", apache.APLOG_WARNING, 
s)\r\napache.log_error("xNOTICEx", apache.APLOG_NOTICE, s)\r\n 
  apache.log_
error("xINFOx", apache.APLOG_INFO, s)\r\n 
apache.log_error("xDEBUGx", apache.APLOG_DEBUG, s)\r\n\r\n# see 
what\'s in the log now\r\nf = o
pen("%s/logs/error_log" % apache.server_root())\r\n# for some 
reason re doesn\'t like \\n, why?\r\nimport string\r\n 
log = "".join(map(st
ring.strip, f.readlines()))\r\nf.close()\r\n\r\nif not 
re.search("xEMERGx.*xALERTx.*xCRITx.*xERRx.*xWARNINGx.*xNOTICEx.*xINFOx.*xDEBUGx", 
log):\
r\nself.fail("Could not find test messages in 
error_log")\r\n\r\n\r\ndef 
test_apache_table(self):\r\n\r\nlog = self.req.log_
error\r\n\r\nlog("Testing table object.")\r\n\r\n# tests 
borrowed from Python test suite for d

Re: mod_python 3.2.8 available for testing

2006-02-22 Thread Mike Looijmans

Yup. Much better with that one:

##
Ran 6 tests in 91.047s

OK
##

You have my +1 (even though you weren't waiting for it)

And my merged util.py also passes the test suite, which is what I really 
wanted to know.


--
Mike Looijmans
Philips Natlab / Topic Automation


Nicolas Lehuen wrote:

Yup, the SVN trunk currently contains tests that fail on 3.2.8 (Graham
added some features and their unit test in the trunk), but you can
test the 3.2.8 release against the 3.2.8 test suite (available from
https://svn.apache.org/repos/asf/httpd/mod_python/tags/release-3-2-8 )
and you won't see any problems.

Regards,
Nicolas

2006/2/22, Mike Looijmans <[EMAIL PROTECTED]>:


I get failing tests on WinXP-sp2, apache 2.0.54, python 2.4.2.

This may be because the test suite is looking for things not there yet.

Put the 3.2.8 binary distribution on my system and did a svn update to
get the latest test suite. When running the tests, I got the following
failures (can we do something about the readability of the output):

==
FAIL: test_req_auth_type (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 661, in test_req_auth_type
self.fail(`rsp`)
AssertionError: '\n\nMod_python error: "PythonAuthenHandler
tests::req_auth_type"\n\nTraceback (most recent call last):\n\n  File
"C:\\Python24\\Lib\\site-
packages\\mod_python\\apache.py", line 299, in HandlerDispatch\n
result = object(req)\n\n  File
"C:/source/mod_python/test/htdocs/tests.py", line 609, in req
_auth_type\nif req.auth_type() != "dummy":\n\nAttributeError:
\'mp_request\' object has no attribute \'auth_type\'\n\n\n'

==
FAIL: test_req_construct_url (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 753, in test_req_construct_url
self.fail("construct_url")
AssertionError: construct_url

==
FAIL: test_req_handler (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 1009, in test_req_handler
self.fail(`rsp`)
AssertionError: '\n\nMod_python error: "PythonFixupHandler
tests::req_handler"\n\nTraceback (most recent call last):\n\n  File
"C:\\Python24\\Lib\\site-pac
kages\\mod_python\\apache.py", line 299, in HandlerDispatch\nresult
= object(req)\n\n  File "C:/source/mod_python/test/htdocs/tests.py",
line 770, in req_ha
ndler\nreq.handler = "mod_python"\n\nTypeError: attribute
\'handler\' of \'mp_request\' objects is not writable\n\n\n'

==
FAIL: test_trans (__main__.PerRequestTestCase)
--
Traceback (most recent call last):
  File "test.py", line 1402, in test_trans
self.fail(`rsp`)
AssertionError: ' #\r\n # \r\n # Licensed under the Apache License,
Version 2.0 (the "License"); you\r\n # may not use this file except in
compliance with the L
icense.  You\r\n # may obtain a copy of the License at\r\n #\r\n #
http://www.apache.org/licenses/LICENSE-2.0\r\n #\r\n # Unless required
by applicable law
 or agreed to in writing, software\r\n # distributed under the License
is distributed on an "AS IS" BASIS,\r\n # WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, e
ither express or\r\n # implied.  See the License for the specific
language governing\r\n # permissions and limitations under the
License.\r\n #\r\n #\r\n # $Id:
 tests.py 379638 2006-02-22 00:41:51Z jgallacher $\r\n #\r\n\r\n#
mod_python tests\r\n\r\nfrom __future__ import generators\r\nfrom
mod_python.python22 import *
\r\n\r\nfrom mod_python import apache\r\nimport unittest\r\nimport
re\r\nimport time\r\nimport os\r\nimport cStringIO\r\n\r\n# This is used
for mod_python.publi
sher security tests\r\n_SECRET_PASSWORD = \'root\'\r\n__ANSWER =
42\r\n\r\nclass SimpleTestCase(unittest.TestCase):\r\n\r\ndef
__init__(self, methodName, re
q):\r\nunittest.TestCase.__init__(self, methodName)\r\n
self.req = req\r\n\r\ndef test_apache_log_error(self):\r\n\r\n
  s = self.req.se
rver\r\napache.log_error("Testing apache.log_error():",
apache.APLOG_INFO, s)\r\napache.log_error("xEMERGx",
apache.APLOG_EMERG, s)\r\na
pache.log_error("xALERTx", apache.APLOG_ALERT, s)\r\n
apache.log_error("xCRITx", apache.APLOG_CRIT, s)\r\n
apache.log_error("xERRx&quo

FieldStorage improvements as in MODPYTHON-93

2006-03-13 Thread Mike Looijmans
I'm currently using a 'private' patch, but I would really like to see 
this one incorporated in some release:


http://issues.apache.org/jira/browse/MODPYTHON-93

The patch is there, all unit tests pass, and I've been torturing it 
myself for a while now.


What else can (must?) I do to get it integrated?

(I really need the file upload to work decently, and others may like the 
improved performance)


--
Mike Looijmans
Philips Natlab / Topic Automation



Re: FieldStorage improvements as in MODPYTHON-93

2006-03-13 Thread Mike Looijmans


Mike Looijmans
Philips Natlab / Topic Automation


Jim Gallacher wrote:

Mike Looijmans wrote:
I'm currently using a 'private' patch, but I would really like to see 
this one incorporated in some release:

http://issues.apache.org/jira/browse/MODPYTHON-93
The patch is there, all unit tests pass, and I've been torturing it 
myself for a while now.

What else can (must?) I do to get it integrated?


Nothing. It's on my todo list, and if there are no objections I'll 
commit it to trunk.


I'm now very busy doing 'nothing' to get that to happen :-)

On the other hand, if you want it backported to 3.2.x for a bugfix 
release, you should do a little agitation on this list. ;)


The reason I want it committed is so that I can use file-like objects 
(e.g. TarFile) as targets for a file POST. With the current 
implementation, anything you return from the make_file callback must be 
a real file of type(FileType) otherwise FieldStorage will call 'read' on 
it and attempt to fetch the whole file into a string...


(I really need the file upload to work decently, and others may like 
the improved performance)

Just curious - how much of a performance boost are you seeing?


I never did any measurements - but if I'm bored I'll do some 
benchmarks... (as in: lies, damn lies, statistics, delivery dates, and 
benchmarks). I don't know what the impact of stuffing things into a 
cStringIO object and reading them back as string is...


Mike



Re: mod_python roadmap

2006-03-14 Thread Mike Looijmans

...

Any other issues for a 3.2.9 release?
(This is where Mike L. should advocate for MODPYTHON-93 - Improved 
FieldStorage. ;) )


Eh, yeah, I have a suggestion:

MODPYTHON-93 - Improved FieldStorage
- Stop putting simple key/value pairs into StringIO objects, put them simple 
str objects
- Use a dictionary for fields
- Allow any file-like object ('write' method) to be used in a make_file callback
(all the coding is already done)

Mike Looijmans
Philips Natlab / Topic Automation



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2006-03-20 Thread Mike Looijmans

I was checking the documentation with respect to this patch and I noticed a 
couple of things.

The FieldStorage constructor has 2 undocumented parameters, file_callback=None, 
field_callback=None. These were added in r160259 as part of  Barry Pearce's fix 
for MODPYTHON-40. In the same patch, the make_file and make_field methods were 
added. These allow FieldStorage to be subclassed and provide an alternative to 
using the callbacks.

In Mike's modpython325_util_py_dict.patch, the make_field and make_file methods 
have be removed, with the reasoning that the callbacks should be sufficient for 
controlling the upload.

We never had a discussion of the merits of the 2 approaches - subclassing vs 
passing the callbacks in the FieldStorage constructor. Should we perhaps allow 
both, and reinstate make_field and make_file, or are the callbacks sufficient?


The funny thing is that subclassing was my first way of implementing it. 
 Someone beat me to it, so I adopted the callbacks. The structure of 
FieldStorage requires either or both methods. The __init__ constructor 
is doing all the parsing already, so something Pythonic like injecting 
the methods later will not work:

form = util.FieldStorage()
form.make_file = myMakeFileFunction
form.parse(req)

Because of backward compatibility, my vote would be on the callbacks. 
Overriding looks more organized, more structured programming, but when 
you start using the FieldStorage in multiple handlers, providing 
callback functions is much handier for everyday use (otherwise, you'll 
end up with a dozen different FieldStorage classes or mixins for which 
you have to invent names).



Further infomation on the callbacks, including some nice examples, can be found 
 in the python-dev archives. See 
http://article.gmane.org/gmane.comp.apache.mod-python.devel/760

I'll update the docs as part of the fix for this issue.


Good idea - and a good page too, thanks Barry.



Re: mod_python 3.3.0-dev-20060321 available for testing

2006-03-22 Thread Mike Looijmans

Nicolas, could you make a Win32 binary for us poor people?

--
Mike Looijmans
Philips Natlab / Topic Automation



Re: mod_python 3.3.0-dev-20060321 available for testing

2006-03-22 Thread Mike Looijmans
And in addition to running the unit tests, try running your actual web 
site with this release. That usually manages to bring up things the 
developers never thought of - so that we can extend the unit tests with 
those cases as well. I can imagine that some code may depend on some 
specific behaviour of FieldStorage for example.


--
Mike Looijmans
Philips Natlab / Topic Automation


Jim Gallacher wrote:

mod_python-3.3.0-dev-20060321 is available for testing.

We are asking the mod_python development community for assistance in 
testing the current development branch. Hopefully this will allow us to 
catch new bugs or regressions early, and when we are ready for the next 
release the beta cycle will be much shorter.


This snapshot addresses 33 issues since 3.2.7 was released, including 
apache 2.2 support and the introduction of a new module importer.


The files are (temporarily) available here:

http://people.apache.org/~jgallacher/mod_python/dist/

Please download it, then do the usual

$ ./configure --with-apxs=/wherever/it/is
$ make
$ (su)
# make install

Then (as non-root user!)

$ make check

or if you prefer to run the tests the old way:

$ cd test
$ python test.py

Make a note of any failing tests.

If all the tests pass, give the new module importer a workout by 
uncommenting line 328 in test/test.py:


 #PythonOption('mod_python.future.importer *'),

and then re-run the tests.

$ make check

And see if any tests fail. If they pass, send a +1 to the list, if they
fail, send the details (the versions of OS, Python and Apache, the test
output, and suggestions, if any).

Thank you for your assistance,
Jim Gallacher





Re: mod_python 3.3.0-dev-20060321 available for testing

2006-03-22 Thread Mike Looijmans

Jim Gallacher wrote:

Mike Looijmans wrote:
And in addition to running the unit tests, try running your actual web 
site with this release. That usually manages to bring up things the 
developers never thought of - so that we can extend the unit tests 
with those cases as well. I can imagine that some code may depend on 
some specific behaviour of FieldStorage for example.


As nice as it would be to get some real-world feedback, let's not forget 
this is a *development* snapshot, and should not be considered stable 
for production use. Test your application on your website development 
machine if you can - the more testing the better. I'm sure that is what 
Mike meant, but I just want to make it clear that we are not suggesting 
this is a release candidate for production use.


From my post that intention wasn't clear, so thanks Jim for pointing 
that out. Want I wanted to write is to run your site's code on it, but 
not (necessarily) your production site. Most of us will have some 
development system where they check things out and/or do development and 
testing before putting them into the field.




Grouping tests (was: Latest tests)

2006-03-29 Thread Mike Looijmans
What I have been doing in a totally unrelated Python project is to create test groups simply by 
putting them into separate modules. The main test module test.py looks like this:


## (test.py)
import unittest
from test_something import *
from test_someother import *
from test_yetmore import *
if __name__ == '__main__':
unittest.main(module=__import__(__name__))
##

This works because unittest takes all 'things' that start with 'test' in the provided module and 
runs them. So anything we bring into our namespace gets run. This also makes it possible to import 
tests from other projects, and share these tests between projects.

The other test_ modules look much alike:

## (test_something.py)
import unittest
import test_peer
class test04MultiPeerSystem(test_peer.BaseTestRealThing):
def test03DiscoveryC08B20(self):
"Multiple clients w/ discovery:  8 peers, 20 blocks each"
self.runDiscoveryTest(nclients = 8, nblocks=20)
if __name__ == '__main__':
unittest.main(module=__import__(__name__))
##

This makes it very easy to handle test subsets, and run single test suites. 
Just run

$python test.py

to run ALL the tests. To run just a single set, run

$python test_something.py

And to run a single test, either of these will do:

$python test_something.py test04MultiPeerSystem
$python test.py test04MultiPeerSystem

The real power shows when you want to run 4 or 5 test sets, and/or only parts of some test sets. 
Just create a new "main" test unit that imports the desired ones, and you're set:


## (test_few.py)
import unittest
from test_something import *
from test_someother import TestOnlyThis
if __name__ == '__main__':
unittest.main(module=__import__(__name__))
##

Because some tests take very long to run (in my vocabulary, "long" means more than a second), this 
saves me a lot of time when working on a part of a big project, where I don't need to run all tests 
all the time.


--
Mike Looijmans
Philips Natlab / Topic Automation


Jim Gallacher wrote:
...
I've been playing with some ideas for a new test framework, using a 
subclass of unittest.TestLoader to find and configure tests. I want to 
play around with it for another day or so before sharing but at this 
point I'm pretty confident it'll work.


Creating a new set of tests could be as simple as:

testsuites/core/simpletest.py


from testlib import VirtualHostTest

class MyTests(VirtualHostTest):
def test_hello_world(self):
rsp = self.send_request()
self.failUnless(rsp == 'test ok')

def test_goodbye_world(self):
rsp = self.send_request()
self.failUnless(rsp == 'test ok')


htdocs/testhandlers/core/simpletest.py
--

from mod_python import

def test_hello_world(req):
req.write('test ok')
return apache.OK

def test_goodbye_world(req):
req.write('test ok')
return apache.OK

$ python testrunner.py

Things like virtual host names and handler directives required for 
configuration or send_request() are automatically derived from the test 
class and test method names. It will still be possible to provide custom 
apache configuration directives in a manner similar to that which we 
currently use, but for most tests this will not be required.


Usage would look something like this:

Run all the tests
$ python testrunner.py

Run one test
$ python testrunner.py -t core.simpletest.MyTests.test_hello_world

Run a group of tests (this would load the TestCase subclasses in 
testsuites/sessions/filesession.py):


$ python testrunner.py -t sessions.filesession


Jim




Re: GET request content and mod_python.publisher/psp.

2006-04-02 Thread Mike Looijmans

My question is, should mod_python.publisher and mod_python.psp be
enhanced and call req.discard_request_body() for a GET request to avoid
the posibilities of any problems arising due to a client sending content
for a GET request?


-1 on that particular way of implementing it. If the GET request has a 
body, that body probably serves some purpose.


The right thing to do for any handler that does not know how to handle 
the request is to return a 'bad request' error to the client. Just 
throwing away what is not understood is not very nice to developers and 
users - you'll get unexpected behaviour because the server is only 
handling a part of the request.


The trouble here is of course that publisher or PSP cannot tell forehand 
that the handler will read the body data. So the only way to determine 
this is to have the handler handle the request, and after that, check if 
it did read all of the request. If not, you're too late to report this 
to the client, because the headers have already been sent out. Putting 
some message in an error log that no-one will ever read (in particular 
not the one who caused that problem) does not make sense either. To fix 
this, the handler should somehow advertise its capability to read the body.


I guess you can't really solve the problem. Which is the lesser evil?

Mike.



Re: FieldStorage and multiline headers in multipart/form.

2006-04-11 Thread Mike Looijmans
Short answer: Yes, multiline headers are allowed there, and yes, 
mod_python will fail to read them properly.


Longer answer:
I checked throught the util.py code, and nope, the fix is not in there.

I browsed through the appropriate RFC documents, in particular RFC1521
http://www.faqs.org/rfcs/rfc1521.html
This clearly states that RFC822 headers are to be used, which means that 
 those multiline headers are indeed allowed. The 1521 document even 
contains samples with multiline headers.


Just a thought: Can we (re)use the rfc822 mime parser that's already 
built-in in Python to do the work for us?


--
Mike Looijmans
Philips Natlab / Topic Automation


Graham Dumpleton wrote:

With FieldStorage being discussed on main user mailing list, came across
this old post of the mailing list:

  http://www.modpython.org/pipermail/mod_python/2001-November/012256.html

What it is saying is that some HTTP clients use multi line headers in sections
of multipart/form postings and that mod_python doesn't handle this.

Looking at FieldStorage code, which I don't grok right at this minute
because of an intensive coding frenzy today on the back of not enough
sleep last night, I can't see that it has ever been modified to accomodate
such multiline headers if indeed it needs to.

Anyone who is more intimate with FieldStorage code want to have a
better look at validity of original mailing list post and whether multiline
headers are legitimate and whether there indeed could be a problem in
mod_python.

Graham




Re: wepy

2006-04-20 Thread Mike Looijmans

import sys
sys.stdout.write(open('penguin.png', 'rb').read())



NO. NO. NO.

sys.stdout is a global variable. If you want sys.stdout.write() to end 
up in the user's terminal, your web server will be able to either serve 
only one request at a time, or must be forced into running separate 
processes for each concurrent request (which will not happen on windows 
and many other supported platforms).


We made that mistake with cgipython, and we won't make it again. Learn 
from the mistakes of others, as you do not have the time to make them 
all yourself.


Mike



Re: wepy

2006-04-20 Thread Mike Looijmans
Your code is perfectly thread safe, yes. But the assignment to 
sys.stdout IS NOT. There is only one sys.stdout, and all threads use the 
same one.


Example (two threads):

##Thread 1:
- initializes request from client 1
- sets sys.stdout to StdOut(request ...) object
- writes "hello1 " to sys.stdout
##Thread 2
- initializes request from client 2
- sets sys.stdout to StdOut(request ...) object
- writes "hello2 " to sys.stdout
##Thread 1
- writes "world1" to sys.stdout
- terminates, and sends sys.stdout.buffer to client
##Thread 2
- writes "world2" to sys.stdout
- terminates, and sends sys.stdout.buffer to client

Now look at what will happen with your code and a threading apache:
Client 1 will receive "hello2 world1"
Client 2 will receive "hello2 world1world2"

Explanation:

##Thread 1:
- initializes request from client 1
- sets sys.stdout to StdOut(request ...) object
- writes "hello1 " to sys.stdout
(sys.stdout.buffer == "hello1 ")
##Thread 2
- initializes request from client 2
- sets sys.stdout to StdOut(request ...) object
(and thus replaces the buffer that thread 1 put there, its contents are 
lost)

- writes "hello2 " to sys.stdout
(sys.stdout.buffer == "hello2 ")
##Thread 1
- writes "world1" to sys.stdout
(sys.stdout.buffer == "hello2 world1")
- terminates, and sends sys.stdout.buffer to client
##Thread 2
- writes "world2" to sys.stdout
(sys.stdout.buffer == "hello2 world1world2")
- terminates, and sends sys.stdout.buffer to client


Mike Looijmans
Philips Natlab / Topic Automation

Fırat KÜÇÜK wrote:

Mike Looijmans yazmış:


Yes,

the problem is not whatever class or method you use to send the output 
to the client, but the problem is the use of the global sys.stdout 
variable. The is only one such object in your system. If there are two 
threads each handling a request, there is no telling to which one the 
sys.stdout.write or print statement will send its output. To prevent 
this, you'd have to mutex access to it, and leads to being able to 
handle only one request at a time.


On many unix flavors, apache defaults to 'forking', which runs each 
request in a separate process. In these circumstances, you will not 
detect this problem. On other platforms, or if you modify the 
httpd.conf file, apache uses threading or even a mix of threading and 
forking.


Important note: Since print sends to sys.stdout, using print is just 
as bad a method to send output to the client as sys.stdout.write.


The only solution is to use the context in your handler, i.e.
WRITE("hello world\n")

or the more obvious (like in publisher and PSP):
req.write("Hello world\n")

Anything that makes use of globals to send back data to the client 
will not work, so "wepi.write" is also out of the question (if wepi is 
a module).


--
Mike Looijmans
Philips Natlab / Topic Automation


Fırat KÜÇÜK wrote:





import sys
sys.stdout.write(open('penguin.png', 'rb').read())







NO. NO. NO.

sys.stdout is a global variable. If you want sys.stdout.write() to 
end up in the user's terminal, your web server will be able to 
either serve only one request at a time, or must be forced into 
running separate processes for each concurrent request (which will 
not happen on windows and many other supported platforms).


We made that mistake with cgipython, and we won't make it again. 
Learn from the mistakes of others, as you do not have the time to 
make them all yourself.


Mike



Hi,
but i replaced sys.stdout with a StdOut class. is it problem too?

Fırat KÜÇÜK






Hi,


###[StdOut 
class]###


class StdOut:
   "replaces sys.stdout"

   buffer = ""

   # --[overidden 
methods]-


   ### [__init__ method]---
   def __init__(self, req, headers_object):
   self.req = req
   self.headers = headers_object

   # -[public 
methods]-


   ### [write method]---
   def write(self, text): self.buffer += str(text)



this is my sys.stdout replacement class and write method use buffer.

and finally after execution of .py file

# send buffer
req.write(sys.stdout.buffer)

I think this is thread safe method for sending data.
print or sys.stdout.write indirectly use req.write method.


# send buffer
req.write(sys.stdout.buffer)






Re: mod_python 3.2.9-rc2 available for testing

2006-06-27 Thread Mike Looijmans

Having written most of the issue "93" code, here's my opinion:


* How much non-compatibility is acceptable in a patch release?


None.
Though it hurts my personal feelings that my patch did manage to break 
something (who imagined anyone trying to hack data into the FS object?), 
we cannot break other people's software with a mere patch.


I don't have a clue what the "trac" software is, but I imagine that even 
for this particular piece of software, I may be able to come up with 
some form of emulation (e.g. using __setattr__ hooks) that still keeps 
it working with the patch. The question is of course, where does it stop?


Note that the "old" code still contains a bug that the patch solves. The 
problem is that the old code does not work correctly if the objects 
created in a make_file callback are not real files created with file().



* How are applications supposed to perform write operations on a
FieldStorage, in 3.3 and the future?


Since we claim that FieldStorage behaves like a dictionary, the obvious 
syntax would be:


form['mykey'] = 'value'

This would require a __setitem__ method which should look something like 
this (with some extra code to handle lists):


def __setitem__(self, key, value):
if self.dictionary is None:
# Create self.dictionary as in __getitem__
self.dictionary[key] = StringField(value)




Adding to FieldStorage

2006-06-28 Thread Mike Looijmans

* How are applications supposed to perform write operations on a
FieldStorage, in 3.3 and the future?


Since we claim that FieldStorage behaves like a dictionary, the obvious
syntax would be:

form['mykey'] = 'value'

This would require a __setitem__ method which should look something like
this (with some extra code to handle lists):

   def __setitem__(self, key, value):
   if self.dictionary is None:
   # Create self.dictionary as in __getitem__
   self.dictionary[key] = StringField(value)




Trac also appends to the FieldStorage list attribute, which complicates
things a little further.

The '93' code also adds the add_field() method. Although this is not
documented, there is nothing to indicate that it is a private method
either. Calling add_field on a FieldStorage instance will not likely
give the results a user expects, so we need to give some attention to
that as well.


I initially intended add_field as a callback routine, which was later on replaced with the callback. 
The method is not really neccesary, but quite convenient when subclassing. All the processing takes 
place in __init__ which makes it impossible to just replace the method at runtime, which I had in 
mind at first.


I think we should support the dictionary syntax, e.g.

form['key']='value'
assert form['key'] == 'value'

The dict syntax will replace all 'key' fields with the new value. The add_field method can be 
reworked to act as one would expect, e.g.:


assert form['key'] == 'value'
form.add_field('key', 'value2')
assert form['key'] == ['value','value2']
form.add_field('key', 'value1')
assert form['key'] == ['value','value2','value1']

Maybe we can come up with a better name than add_field (append?) as well.



Re: mod_python 3.2.9-rc2 available for testing

2006-06-28 Thread Mike Looijmans

So do you think we can release 3.2.9 with the old 3.2.8 code, or should
this block the release until we have a correct fix? I'm hoping we can do
a 3.3 release in October or November, FYI.


I don't think it is worth trying to work on a fix that makes new
FieldStorage code work with Trac. Just go with 3.2.8 FieldStorage code
in 3.2.9. In 3.3 then use new code with understanding that it will break
Trac.

I see this as acceptable as version of Trac in subversion trunk does
away with all the FieldStorage fiddles and instead they base it on a
WSGI wrapper which I presume therefore avoids use of FieldStorage.

We just need to ensure that this newer version of Trac is released
before we release mod_python 3.3 and that we notify them that older
versions of Trac will not work with mod_python 3.3 and people will need
to use their most latest version of Trac instead.


You have my +1 on that. Let's keep the 3.2.8 implementation for now, and do the rework in 3.3 in 
cooperation with the Trac people.


Mike.



Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2006-09-11 Thread Mike Looijmans

As I recall, dropping the current Trac version for 3.3 is what we wanted to do.

In addition, there was talk about implementing a __setitem__ method as well to support manual 
insertion of fields, so that FieldStorage becomes a real dictionary-like object. The add_field 
method was to be reworked so that it could be publicly available.


(side thought: Allowing population from code rather than only from a Request object might allow 
FieldStorage to be used in test code)


--
Mike Looijmans
Philips Natlab / Topic Automation


Graham Dumpleton (JIRA) wrote:
[ http://issues.apache.org/jira/browse/MODPYTHON-93?page=comments#action_12433818 ] 

Graham Dumpleton commented on MODPYTHON-93:

---

Have we decided that we will not try and be compatible with Trac and expect 
people using mod_python 3.3 to use the next unreleased version of Trac? This 
next version of Trac works via a WSGI gateway and thus doesn't use FieldStorage 
and thus doesn't have the problems seen here. Can we mark the original issue as 
resolved for 3.3?



Improve util.FieldStorage efficiency


   Key: MODPYTHON-93
   URL: http://issues.apache.org/jira/browse/MODPYTHON-93
   Project: mod_python
Issue Type: Improvement
Components: core
  Affects Versions: 3.2.7
  Reporter: Jim Gallacher
   Assigned To: Jim Gallacher
  Priority: Minor
   Fix For: 3.3

   Attachments: modpython325_util_py_dict.patch


Form fields are saved as a list in a FieldStorage class instance. The class 
implements a __getitem__ method to provide dict-like behaviour.  This method 
iterates over the complete list for every call to __getitem__. Applications 
that need to access all the fields when processing the form will show O(n^2) 
behaviour where n == the number of form fields. This overhead could be avoided 
by creating a dict (to use as an index) when the FieldStorage instance is 
created.
Mike Looijmans has been investigating StringField and Field as well. It is 
probably reasonable to include information on his work in this issue as well, 
so that we can consider all of these efficiency issues in toto.







Re: Status of mod_python 3.3.

2006-10-19 Thread Mike Looijmans

Graham Dumpleton wrote:
...

MODPYTHON-93 Improve util.FieldStorage efficiency.

This was actually marked as resolved but reopened because it was discovered
that changes meant that Trac <=0.9.6 would no longer work. The changes were
also backed out of mod_python 3.2.X branch and not released in 3.2.10.

At this point I believe we have agreed that code in 3.3 would be left as is and
people would need to use Trac >=0.10, which has now been release, with
mod_python 3.3 or later.

There was comments as to whether util.FieldStorage needs to have more
dictionary like access, but at this point I believe we should mark this issue
as resolved and if people want dictionary like access, they can open a
separate JIRA issue for that and we deal with it in a future release.


I give it a +1 to mark it resolved.

There's more than enough good stuff in the 93 fix (such as being able to upload large amounts of 
data to file-like objects) to justify breaking just one hack (which is easy to fix on the Trac side 
anyway). It would be logical to upgrade both the mod_python and trac releases on the machine 
simultaneously, so I doubt there will be any real hits in the field.


Mike.



Re: [VOTE] does mod_python want to be a TLP

2007-05-10 Thread Mike Looijmans
Apache includes a feather in its logo, and Python is associated with a 
snake. "Quetzalcoatl" means "feathered snake" and does not appear to be 
used by any other software project.


Which I can fully understand, because "Quetzalcoatl" is harder to pronounce than the 16 character 
password for the mainframe that was generated using /dev/random.


I liked "Asphyxia" though. Makes a nice password too.

Mike.




Re: [jira] Commented: (MODPYTHON-222) Support for chunked transfer encoding on request content.

2007-07-02 Thread Mike Looijmans
For my tape project I used "None" as valid len argument to indicate reading the next available tape 
block, regardless of size (underlying layer figured out the tape block size). This worked well 
together with existing file type APIs.


So that would make the API:

req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of the stream. Blocks 
until the requested amount has been read.

req.read() - reads all data from the stream (bad idea), blocks until EOS.
req.read(None) - reads the next chunk of data, blocks if no data available. 
Returns 0 size if at EOS.


Mike Looijmans
Philips Natlab / Topic Automation


M Willson (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MODPYTHON-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507699 ] 


M Willson commented on MODPYTHON-222:
-

If not possible in full generality then I'd be happy with a special separate 
read method for getting what is in the input buffer so far, with 'read' just 
blocking until the chunked upload is complete. Would this be possible?

Or if we could specify a handler in python which is called back with each new 
chunk of data received, that would give the ultimate flexibility...


Support for chunked transfer encoding on request content.
-

Key: MODPYTHON-222
URL: https://issues.apache.org/jira/browse/MODPYTHON-222
Project: mod_python
 Issue Type: New Feature
 Components: core
   Affects Versions: 3.3.1
   Reporter: Graham Dumpleton

It is currently not possible to use chunked transfer encoding on request 
content delivered to a mod_python request handler.
The use of chunked transfer encoding is explicitly blocked in C code by:
rc = ap_setup_client_block(self->request_rec, REQUEST_CHUNKED_ERROR);
To allow chunked transfer encoding instead of REQUEST_CHUNKED_ERROR it would be 
necessary to supply REQUEST_CHUNKED_DECHUNK.
Problem is that it isn't that simple.
First off, the problems associated with MODPYTHON-212 have to be fixed with 
code being able to cope with there being no content length.
The next issue is that req.read() method is currently documented as behaving as:
  If the len argument is negative or omitted, reads all data given by the 
client.
This means that can't have req.read() with no arguments mean give me everything 
that is currently available in input buffers as everyone currently expects it 
to return everything sent by client. Thus, to be able to process streaming data 
one would have to supply an amount of data that one wants to read. The code for 
that though will always try to ensure that that exact amount of data is read 
and will block if not enough and not end of input. A handler though may not 
want it to block and be happy with just getting what is read and only expect it 
to block if nothing currently available.
In other words, the current specification for how req.read() behaves is 
incompatible with what would be required to support chunked transfer encoding 
on request content.
Not sure how this conflict can be resolved.







[jira] Updated: (MODPYTHON-93) Improve util.FieldStorage efficiency

2005-12-01 Thread Mike Looijmans (JIRA)
 [ http://issues.apache.org/jira/browse/MODPYTHON-93?page=all ]

Mike Looijmans updated MODPYTHON-93:


Attachment: modpython325_util_py_dict.patch

What it does:
- Simplifies the creation of StringField objects. This was already
marked as a TODO in the 3.2.5b code.
- Does not create a file object (cStringIO) for each and every field.
- FieldStorage.get() will always return the same instance(s) for any
given name.
- FieldStorage.get() is very cheap now (does not create new objects,
which is expensive in Python)
- use a dictionary-on-demand.
- Lots of code removed for this one, and a few lines added.
- items() function, which, contrary to common belief, preserves argument 
ordering.

> Improve util.FieldStorage efficiency
> 
>
>  Key: MODPYTHON-93
>  URL: http://issues.apache.org/jira/browse/MODPYTHON-93
>  Project: mod_python
> Type: Improvement
>   Components: core
> Versions: 3.3
> Reporter: Jim Gallacher
> Assignee: Jim Gallacher
> Priority: Minor
>  Attachments: modpython325_util_py_dict.patch
>
> Form fields are saved as a list in a FieldStorage class instance. The class 
> implements a __getitem__ method to provide dict-like behaviour.  This method 
> iterates over the complete list for every call to __getitem__. Applications 
> that need to access all the fields when processing the form will show O(n^2) 
> behaviour where n == the number of form fields. This overhead could be 
> avoided by creating a dict (to use as an index) when the FieldStorage 
> instance is created.
> Mike Looijmans has been investigating StringField and Field as well. It is 
> probably reasonable to include information on his work in this issue as well, 
> so that we can consider all of these efficiency issues in toto.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

2006-10-26 Thread Mike Looijmans (JIRA)
[ 
http://issues.apache.org/jira/browse/MODPYTHON-93?page=comments#action_12444850 
] 

Mike Looijmans commented on MODPYTHON-93:
-

A very simple approach would be to delete the dictionary attribute when 
any change is made.

The add_field method should be made private, and one could add a 
__setitem__ method like this:

 def __setitem__(self, key, value):
 if self.__dict__.has_key('dictionary'):
 del self.dictionary
 self._add_field(key, value)

A similar approach can be taken to add the __delitem__ functionality.


Mike Looijmans
Philips Natlab / Topic Automation




> Improve util.FieldStorage efficiency
> 
>
> Key: MODPYTHON-93
> URL: http://issues.apache.org/jira/browse/MODPYTHON-93
> Project: mod_python
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.2.7
>Reporter: Jim Gallacher
> Assigned To: Graham Dumpleton
>Priority: Minor
> Fix For: 3.3
>
> Attachments: modpython325_util_py_dict.patch
>
>
> Form fields are saved as a list in a FieldStorage class instance. The class 
> implements a __getitem__ method to provide dict-like behaviour.  This method 
> iterates over the complete list for every call to __getitem__. Applications 
> that need to access all the fields when processing the form will show O(n^2) 
> behaviour where n == the number of form fields. This overhead could be 
> avoided by creating a dict (to use as an index) when the FieldStorage 
> instance is created.
> Mike Looijmans has been investigating StringField and Field as well. It is 
> probably reasonable to include information on his work in this issue as well, 
> so that we can consider all of these efficiency issues in toto.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MODPYTHON-222) Support for chunked transfer encoding on request content.

2007-07-02 Thread Mike Looijmans (JIRA)

[ 
https://issues.apache.org/jira/browse/MODPYTHON-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509749
 ] 

Mike Looijmans commented on MODPYTHON-222:
--

For my tape project I used "None" as valid len argument to indicate reading the 
next available tape 
block, regardless of size (underlying layer figured out the tape block size). 
This worked well 
together with existing file type APIs.

So that would make the API:

req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of 
the stream. Blocks 
until the requested amount has been read.
req.read() - reads all data from the stream (bad idea), blocks until EOS.
req.read(None) - reads the next chunk of data, blocks if no data available. 
Returns 0 size if at EOS.


Mike Looijmans
Philips Natlab / Topic Automation






> Support for chunked transfer encoding on request content.
> -
>
> Key: MODPYTHON-222
> URL: https://issues.apache.org/jira/browse/MODPYTHON-222
> Project: mod_python
>  Issue Type: New Feature
>  Components: core
>Affects Versions: 3.3.1
>Reporter: Graham Dumpleton
>
> It is currently not possible to use chunked transfer encoding on request 
> content delivered to a mod_python request handler.
> The use of chunked transfer encoding is explicitly blocked in C code by:
> rc = ap_setup_client_block(self->request_rec, REQUEST_CHUNKED_ERROR);
> To allow chunked transfer encoding instead of REQUEST_CHUNKED_ERROR it would 
> be necessary to supply REQUEST_CHUNKED_DECHUNK.
> Problem is that it isn't that simple.
> First off, the problems associated with MODPYTHON-212 have to be fixed with 
> code being able to cope with there being no content length.
> The next issue is that req.read() method is currently documented as behaving 
> as:
>   If the len argument is negative or omitted, reads all data given by the 
> client.
> This means that can't have req.read() with no arguments mean give me 
> everything that is currently available in input buffers as everyone currently 
> expects it to return everything sent by client. Thus, to be able to process 
> streaming data one would have to supply an amount of data that one wants to 
> read. The code for that though will always try to ensure that that exact 
> amount of data is read and will block if not enough and not end of input. A 
> handler though may not want it to block and be happy with just getting what 
> is read and only expect it to block if nothing currently available.
> In other words, the current specification for how req.read() behaves is 
> incompatible with what would be required to support chunked transfer encoding 
> on request content.
> Not sure how this conflict can be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MODPYTHON-238) req.connection.keepalive should be writable

2007-07-08 Thread Mike Looijmans (JIRA)

[ 
https://issues.apache.org/jira/browse/MODPYTHON-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511031
 ] 

Mike Looijmans commented on MODPYTHON-238:
--

If there is a separate req.chunked, I'd advise to keep them separated. It's 
perfectly valid to have 
a chunked stream with a connection: close setting, and someone is bound to use 
that. For one thing, 
it allows the client to reliably detect whether all the data has really been 
sent.

Having a side-effect like setting one disables the other is bound to surprise 
someone.

Mike Looijmans
Philips Natlab / Topic Automation






> req.connection.keepalive should be writable
> ---
>
> Key: MODPYTHON-238
> URL: https://issues.apache.org/jira/browse/MODPYTHON-238
> Project: mod_python
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.1
>Reporter: Graham Dumpleton
>Priority: Minor
>
> The attribute req.connection.keepalive should be writable. This would allow 
> handlers to use:
>   req.connection.keepalive = apache.AP_CONN_CLOSE
> Doing this before any data is written has the effect of disabling keepalive 
> and also turning off chunked encoding for response content when no content 
> length has been supplied.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.