Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2
What i don't like at all in this implementation is the large amount of memcpy operations. 1. line.strip() 2. line[:-x] 3. previous_delimiter + ... The average pass will perform between two and three memcopy operations on the read block. Suggestion: Loose the strip() call - it serves no purpose (the boundary will be the only thing on that line, otherwise it's not a boundary) I've built one without any memcpy whatsoever, I've tried it with the test harnass and it appears equal to the Alexis version (pass all except the generate_split_boundary_file test): def read_to_boundary(self, req, boundary, file, readBlockSize=65536): prevline = "" while 1: line = req.readline(readBlockSize) if not line or line.startswith(boundary): if prevline.endswith('\r\n'): file.write(prevline[:-2]) elif prevline.endswith('\n'): file.write(prevline[:-1]) break file.write(prevline) prevline = line Alexis Marrero wrote: > New version of read_to_boundary(...) > > readBlockSize = 1 << 16 > def read_to_boundary(self, req, boundary, file): > previous_delimiter = '' > while 1: > line = req.readline(readBlockSize) > if line.strip().startswith(boundary): > break > > if line.endswith('\r\n'): > file.write(previous_delimiter + line[:-2]) > previous_delimiter = '\r\n' > > elif line.endswith('\r'): > file.write(previous_delimiter + line[:-1]) > previous_delimiter = '\r' > > elif line.endswith('\n'): > if len(line[:-1]) > 0: > file.write(previous_delimiter + line[:-1]) > previous_delimiter = '\n' > > else: > previous_delimiter += '\n' > > else: > file.write(previous_delimiter + line) > previous_delimiter = '' > > This new functions passes the test for Jim's filetest generator. -- Mike Looijmans Philips Natlab / Topic Automation
Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2
Here's one that passes all the tests, and is 2x as fast as the 'current' and 'new' implementations on random binary data. I haven't been able to generate data where the 'mike' version is slower: def read_to_boundary(self, req, boundary, file, readBlockSize=65536): prevline = "" last_bound = boundary + '--' carry = None while 1: line = req.readline(readBlockSize) if not line or line.startswith(boundary): if prevline.endswith('\r\n'): if carry is not None: file.write(carry) file.write(prevline[:-2]) break elif (carry == '\r') and (prevline[-1] == '\n'): file.write(prevline[:-1]) break # If we get here, it's not really a boundary! if carry is not None: file.write(carry) carry = None if prevline[-1:] == '\r': file.write(prevline[:-1]) carry = '\r' else: file.write(prevline) prevline = line I've attached a modified upload_test_harness.py that includes the new and current, also the 'org' version (as in 3.1 release) and the 'mike' version. In addition, I added some profiling calls to show the impact of the extra 'endswith' and slices. -- Mike Looijmans Philips Natlab / Topic Automation #!/usr/bin/env python import sys from cStringIO import StringIO import md5 ##def generate_split_file(offset=-1, ## readBlockSize=65368, ## fname='testfile'): ##f = open(fname, 'wb') ##f.write('a'*50) ##f.write('\r\n') ##block_size = readBlockSize + offset ##f.write('b'*block_size) ##f.close() def read_to_boundary_current(self, req, boundary, file, readBlockSize): ''' currrent version ''' # # Although technically possible for the boundary to be split by the read, this will # not happen because the readBlockSize is set quite high - far longer than any boundary line # will ever contain. # # lastCharCarried is used to detect the situation where the \r\n is split across the end of # a read block. # delim = '' lastCharCarried = False last_bound = boundary + '--' roughBoundaryLength = len(last_bound) + 128 line = req.readline(readBlockSize) lineLength = len(line) if lineLength < roughBoundaryLength: sline = line.strip() else: sline = '' while lineLength > 0 and sline != boundary and sline != last_bound: if not lastCharCarried: file.write(delim) delim = '' else: lastCharCarried = False cutLength = 0 if lineLength == readBlockSize: if line[-1:] == '\r': delim = '\r' cutLength = -1 lastCharCarried = True if line[-2:] == '\r\n': delim += '\r\n' cutLength = -2 elif line[-1:] == '\n': delim += '\n' cutLength = -1 if cutLength != 0: file.write(line[:cutLength]) else: file.write(line) line = req.readline(readBlockSize) lineLength = len(line) if lineLength < roughBoundaryLength: sline = line.strip() else: sline = '' def read_to_boundary_new(self, req, boundary, file, readBlockSize): ''' Alexis' version read from the request object line by line with a maximum size, until the new line starts with boundary ''' previous_delimiter = '' while 1: line = req.readline(readBlockSize) if line.startswith(boundary): break if line.endswith('\r\n'): file.write(previous_delimiter + line[:-2]) previous_delimiter = '\r\n' elif line.endswith('\r') or line.endswith('\n'): file.write(previous_delimiter + line[:-1]) previous_delimiter = line[-1:] else: file.write(previous_delimiter + line) previous_delimiter = '' def read_to_boundary_org(self, req, boundary, file, readBlockSize): delim = "" line = req.readline(readBlockSize) while line and not line.startswith(boundary): odelim = delim if line[-2:] == "\r\n": delim = "\r\n" line = line[:-2] elif line[-1:] == "\n": delim = "\n" line = line[:-1] else: d
Re: mod_python.util.StorageField.read_to_boundary has problems in 3.1 and 3.2
Alexis Marrero wrote: The next test that I will run this against will be with an obscene amount of data for which this improvement helps a lot! The dumb thing is the checking for boundaries. I'm using http "chunked" encoding to access a raw TAPE device through HTTP with python (it GETs or POSTs the raw data as body, each chunk coresponds to a tape block). It blazes the data through at the max network speed with hardly any CPU usage. This HTTP upload code uses 100% CPU while running on my 3GHz box. The looking-for-line-ends and mime boundaries method is very inefficient compared to that. They oughta have put a "content-length" into every chunk header, and we wouldn't have had this problem in the first place. I think the only realistic way to improve performance is to read the client input in binary chunks, and then looking for '\r\n---boundary' strings in the chunk using standard string functions. Most of the CPU time is now spent in the readline() call. This also means revising all the mime body parsing to cope with that... I doubt if that will be worth the effort for anyone.
More efficient FieldStorage, StringField and Field classes
Inspired by our FieldStorage work with file uploads, I also made some implementation changes to Field and StringField in util.py. In essense, that will make things like using form['key'] multiple times much more efficient. Also, FieldStorage will return the same object if called twice. I'd like some feedback from the group on these changes, and also some directions on what I'd need to do to get them integrated in the mod_python beta. I have no experience whatsoever in contributing to open source projects. Major change is the Field and FieldStorage implementation: class Field: filename = None headers = {} def __init__(self, name): self.name = name def __repr__(self): """Return printable representation.""" return "Field(%s, %s)" % (`self.name`, `self.value`) def __getattr__(self, name): if name != 'value': raise AttributeError, name if self.file: self.file.seek(0) self.value = self.file.read() self.file.seek(0) else: self.value = None return self.value def __del__(self): self.file.close() class StringField(str): """ This class is basically a string with added attributes for compatibility with std lib cgi.py. Basically, this works the opposite of Field, as it stores its data in a string, but creates a file on demand. Field creates a value on demand and stores data in a file. """ filename = None headers = {} ctype = "text/plain" type_options = {} disposition = None disp_options = None # I wanted __init__(name, value) but that does not work (apparently, you # cannot subclass str with a constructor that takes >1 argument) def __init__(self, value): '''Create StringField instance. You'll have to set name yourself.''' str.__init__(self, value) self.value = value def __getattr__(self, name): if name != 'file': raise AttributeError, name self.file = cStringIO.StringIO(self.value) return self.file I've attached a working util.py (based on 3.1.4, so there are also the changes to allow large uploads in the file). # # Copyright 2004 Apache Software Foundation # # Licensed under the Apache License, Version 2.0 (the "License"); you # may not use this file except in compliance with the License. You # may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. See the License for the specific language governing # permissions and limitations under the License. # # Originally developed by Gregory Trubetskoy. # # $Id: util.py 102649 2004-02-16 19:47:28Z grisha $ import _apache from mod_python import apache import cStringIO import tempfile from types import * from exceptions import * parse_qs = _apache.parse_qs parse_qsl = _apache.parse_qsl """ The classes below are a (almost) a drop-in replacement for the standard cgi.py FieldStorage class. They should have pretty much the same functionality. These classes differ in that unlike cgi.FieldStorage, they are not recursive. The class FieldStorage contains a list of instances of Field class. Field class is incapable of storing anything in it. These objects should be considerably faster than the ones in cgi.py because they do not expect CGI environment, and are optimized specifically for Apache and mod_python. """ class Field: filename = None headers = {} def __init__(self, name): self.name = name def __repr__(self): """Return printable representation.""" return "Field(%s, %s)" % (`self.name`, `self.value`) def __getattr__(self, name): if name != 'value': raise AttributeError, name if self.file: self.file.seek(0) self.value = self.file.read() self.file.seek(0) else: self.value = None return self.value def __del__(self): self.file.close() class StringField(str): """ This class is basically a string with added attributes for compatibility with std lib cgi.py. Basically, this works the opposite of Field, as it stores its data in a string, but creates a file on demand. Field creates a value on demand and stores data in a file. """ filename = None headers = {} ctype = "text/plain" type_options = {} disposition = None disp_options = None # I wanted __init__(name, value) but that does not work (apparently, you # cannot subclass str with a constructor that takes >1 argument) def __init__(self, value): '''Create StringField instance. You'll have to set name yourself.''' str
Re: Developer Guidelines (was Re: More efficient FieldStorage, StringField and Field classes)
Ok, following Jim's advice, an having read his guide, I have so far: - Fetched the trunk version with anonymous SVN (no problem) I've already installed apache 2.0.55 and Python 2.4.1 on this Windows machine (the lazy way, with MSI packages). I've also installed the latest Cygwin (all "default" packages). However, I'm not getting anywhere in compiling mod_python. It requires the apxs tool which is apparently not part of any Apache windows distribution. So I grabbed the apache 2.0.55 source for windows, but the configure script is now stuck for over half an hour on: Configuring Apache Portable Runtime Utility library... checking for APR-util... reconfig configuring package in srclib/apr-util now Nothing appears to happen. Anyone here who can help me out getting this up and running on Windows XP? I can also use linux machines for compile and tests (I'm not root on any), but I do most development on this Windows system, so I need a windows executable anyway. PS: Haven't found any speling errors in your document... Jim Gallacher wrote: Mike Looijmans wrote: Inspired by our FieldStorage work with file uploads, I also made some implementation changes to Field and StringField in util.py. In essense, that will make things like using form['key'] multiple times much more efficient. Also, FieldStorage will return the same object if called twice. I'd like some feedback from the group on these changes, and also some directions on what I'd need to do to get them integrated in the mod_python beta. I have no experience whatsoever in contributing to open source projects. I think it's too late in the beta cycle to be rewriting code unless it's to fix a specific, critical bug. We really, really, *really* need to deliver 3.2 final, and I don't want to mess with working code at this point. The changes you are suggesting are likely more appropriate for 3.3, or possibly a 3.2 bugfix release. As far as contributing to mod_python, well, I think you already have the right idea: propose changes; discuss on python-dev; seek consensus; wait for your changes to be committed by myself or Nicolas. Bug fixes and code improvements are much more likely to be accepted than major new features. We want mod_python to be tightly focused on the core functionality of providing a python interface to apache. The fancy bells and whistles are best left to frameworks that utilize mod_python further up the application stack. For your specific changes, I would strongly suggest that you make your changes against the mod_python development branch, ie trunk. Also people are much more likely to test your changes if you submit a patch rather than offering a completely new file. This makes it easier to see what code is being changed and avoids problems with regressions. I've put together a "Developer's Guide" draft that I'd like to eventually see included in either the documentation or on the website. This document doesn't represent any offical position of the mod_python developers, but rather just what I see as best practices and some resources for contributors. Everyone should feel free to offer suggestions to improve this guide. The original is in svn at: http://svn.apache.org/repos/asf/httpd/mod_python/branches/jgallacher/docs/developers-guide.rst And no, I have not spell checked it yet. :) -- Mike Looijmans Philips Natlab / Topic Automation
Re: Developer Guidelines (was Re: More efficient FieldStorage, StringField and Field classes)
David Fraser wrote: > See my other mail - basically you do > cd dist > set APACHESRC=c:\path\to\apache (where this is the Apache2 dir in a > standard installation, now need for apache source) > build_installer Okay, first I downloaded the .NET SDK. (Our internet connection is great here) Then I get: error: Python was built with version 7.1 of Visual Studio, and extensions need to be built with the same version of the compiler, but it isn't installed. I don't have a VS license - that would mean I have to build my own Python in order to get any further. Oh dear... I was wondering - since i won't be hackin' the C code (yet), would it be sufficient to just install the binary, and use the .py files as obtained from the SVN repository? The 3.2.5 beta seems to run just fine here. Cannot run the automated tests though, because of the issues mentioned above. -- Mike Looijmans Philips Natlab / Topic Automation
Re: More efficient StringField and Field classes
It has been awhile, but I finally made the patch. Advice and feedback is welcome. What it does: - Simplifies the creation of StringField objects. This was already marked as a TODO in the 3.2.5b code. - Does not create a file object (cStringIO) for each and every field. - FieldStorage.get() will always return the same instance(s) for any given name. - FieldStorage.get() is very cheap now (does not create new objects, which is expensive in Python) It may break: - Code that creates an instance of Field or StringField directly. This can be fixed by extending the __init__() routine, but I don't think it's needed. It works: - Perfect on all of my projects. Attached: - Patch created by SVN -- Mike Looijmans Philips Natlab / Topic Automation Index: util.py === --- util.py (revision 348746) +++ util.py (working copy) @@ -48,19 +48,8 @@ """ class Field: - def __init__(self, name, file, ctype, type_options, -disp, disp_options, headers = {}): + def __init__(self, name): self.name = name - self.file = file - self.type = ctype - self.type_options = type_options - self.disposition = disp - self.disposition_options = disp_options - if disp_options.has_key("filename"): - self.filename = disp_options["filename"] - else: - self.filename = None - self.headers = headers def __repr__(self): """Return printable representation.""" @@ -81,13 +70,30 @@ self.file.close() class StringField(str): - """ This class is basically a string with - a value attribute for compatibility with std lib cgi.py - """ +""" This class is basically a string with +added attributes for compatibility with std lib cgi.py. Basically, this +works the opposite of Field, as it stores its data in a string, but creates +a file on demand. Field creates a value on demand and stores data in a file. +""" +filename = None +headers = {} +ctype = "text/plain" +type_options = {} +disposition = None +disp_options = None + +# I wanted __init__(name, value) but that does not work (apparently, you +# cannot subclass str with a constructor that takes >1 argument) +def __init__(self, value): +'''Create StringField instance. You'll have to set name yourself.''' +str.__init__(self, value) +self.value = value - def __init__(self, str=""): - str.__init__(self, str) - self.value = self.__str__() +def __getattr__(self, name): +if name != 'file': +raise AttributeError, name +self.file = cStringIO.StringIO(self.value) +return self.file class FieldStorage: @@ -103,8 +109,7 @@ if req.args: pairs = parse_qsl(req.args, keep_blank_values) for pair in pairs: - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) if req.method != "POST": return @@ -123,9 +128,7 @@ if ctype.startswith("application/x-www-form-urlencoded"): pairs = parse_qsl(req.read(clen), keep_blank_values) for pair in pairs: - # TODO : isn't this a bit heavyweight just for form fields ? - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) return if not ctype.startswith("multipart/"): @@ -205,33 +208,44 @@ else: name = None + # create a file object # is this a file? if disp_options.has_key("filename"): if file_callback and callable(file_callback): file = file_callback(disp_options["filename"]) else: - file = self.make_file() + file = tempfile.TemporaryFile("w+b") else: if field_callback and callable(field_callback): file = field_callback() else: - file = self.make_field() + file = cStringIO.StringIO() # read it in - end_of_stream = self.read_to_boundary(req, boundary, file) + self.read_to_boundary(req, boundary, file) file.seek(0) - + # make a Field - field = Field(name, file, ctype, type_options, disp, disp_options, headers) - + if disp_options.has_key("filename"): +
Re: More efficient StringField and Field classes
Is there a way that I can run the unit tests on a Windows systems? I have plenty of Linux boxes here, but I'd have to build and install apache in a "private" corner just to run the tests. Anyway, I see where the problem is, in tests.py it does: def util_fieldstorage(req): from mod_python import util req.write(`util.FieldStorage(req).list`) return apache.OK So I'll just add a __repr__ function to the StringField class and the test should pass then. I have attached a patch (on the 3.2.5b version of util.py) to include this change. That should fix the unit test, and it did not break any of my own code. Jim Gallacher wrote: Hi Mike, I don't have time to dig into this tonight, but your patch causes one of the unit tests to fail. FAIL: test_util_fieldstorage (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 1117, in test_util_fieldstorage self.fail(`rsp`) File "/usr/lib/python2.3/unittest.py", line 270, in fail raise self.failureException, msg AssertionError: "['1', '2', '3', '4']" Jim -- Mike Looijmans Philips Natlab / Topic Automation Index: util.py === --- util.py (revision 348746) +++ util.py (working copy) @@ -48,19 +48,8 @@ """ class Field: - def __init__(self, name, file, ctype, type_options, -disp, disp_options, headers = {}): + def __init__(self, name): self.name = name - self.file = file - self.type = ctype - self.type_options = type_options - self.disposition = disp - self.disposition_options = disp_options - if disp_options.has_key("filename"): - self.filename = disp_options["filename"] - else: - self.filename = None - self.headers = headers def __repr__(self): """Return printable representation.""" @@ -81,13 +70,34 @@ self.file.close() class StringField(str): - """ This class is basically a string with - a value attribute for compatibility with std lib cgi.py - """ +""" This class is basically a string with +added attributes for compatibility with std lib cgi.py. Basically, this +works the opposite of Field, as it stores its data in a string, but creates +a file on demand. Field creates a value on demand and stores data in a file. +""" +filename = None +headers = {} +ctype = "text/plain" +type_options = {} +disposition = None +disp_options = None + +# I wanted __init__(name, value) but that does not work (apparently, you +# cannot subclass str with a constructor that takes >1 argument) +def __init__(self, value): +'''Create StringField instance. You'll have to set name yourself.''' +str.__init__(self, value) +self.value = value - def __init__(self, str=""): - str.__init__(self, str) - self.value = self.__str__() +def __getattr__(self, name): +if name != 'file': +raise AttributeError, name +self.file = cStringIO.StringIO(self.value) +return self.file + +def __repr__(self): + """Return printable representation (to pass unit tests).""" + return "Field(%s, %s)" % (`self.name`, `self.value`) class FieldStorage: @@ -103,8 +113,7 @@ if req.args: pairs = parse_qsl(req.args, keep_blank_values) for pair in pairs: - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) if req.method != "POST": return @@ -123,9 +132,7 @@ if ctype.startswith("application/x-www-form-urlencoded"): pairs = parse_qsl(req.read(clen), keep_blank_values) for pair in pairs: - # TODO : isn't this a bit heavyweight just for form fields ? - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) return if not ctype.startswith("multipart/"): @@ -205,33 +212,44 @@ else: name = None + # create a file object # is this a file? if disp_options.has_key("filename"): if file_callback and callable(file_callback): file = file_callback(disp_options["file
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
Nicolas Lehuen wrote: Why is the ordering so important ? I do understand we need to support multiple values per field name, but I don't see why ordering is needed. Because there are applications out there that will break if you change it. Besides that, think about a form with a few text entry boxes (all the same name, e.g. "spouse"). It would be very confusing for a user of that page to see the text re-ordered every time he clicks one of the buttons on the page. (I'm perfectly aware of at least 4 alternatives, but that is not my point here). From the page code I've written and seen so far, the order of differently named fields is not important. I haven't seen a case where a form expecting a=1&b=2 would fail if you pass it b=2&a=1. But I have seen cases where a=1&x=2&x=3 is not the same as a=1&x=3&x=2. The simple dictionary implementation as proposed would not break that code. -- Mike Looijmans Philips Natlab / Topic Automation
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
Nicolas Lehuen wrote: One proposal was to use an OrderedHashtable. As you can see, the reply was quite definite : "No. There is nothing in the spec that says that these parameters should be in any sort of order. CGI scripts that expect them to be in order are coded incorrectly IMHO." Standing on a soap box preaching like that may work in the Java world, but we Python programmers have a more realistic view of the world. If we don't want to be crushed by a stampede of angry web developers, I guess we should, at least, provide the means to: - Iterate through the arguments "as listed". - Get multi-fields into an array in the order as listed. - Provide fast, dictionary-like access. Hey! this is Python. We can do all that, without losing anything. Let them poor Java folks preach about hell and doom, while we just get on with our lives and build something that meets everybody's expectations. How about we make the first call to get or __getitem__ create the dictionary? We could put code in __getattr__ to create it when it's referenced. Patch is on its way... -- Mike Looijmans Philips Natlab / Topic Automation
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
How about we make the first call to get or __getitem__ create the dictionary? We could put code in __getattr__ to create it when it's referenced. Here is the patch to util.py to use a dictionary-on-demand. Lots of code removed for this one, and a few lines added. There's also a brand new items() function, which, contrary to common belief, preserves argument ordering. Note: This also includes my previous StringField patch, since that is crucial to make the implementation as simple as it is now. Anxiously awaiting your comments and benchmark/profiler results! ['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks'] -- Mike Looijmans Philips Natlab / Topic Automation Index: util.py === --- util.py (revision 348746) +++ util.py (working copy) @@ -48,19 +48,8 @@ """ class Field: - def __init__(self, name, file, ctype, type_options, -disp, disp_options, headers = {}): + def __init__(self, name): self.name = name - self.file = file - self.type = ctype - self.type_options = type_options - self.disposition = disp - self.disposition_options = disp_options - if disp_options.has_key("filename"): - self.filename = disp_options["filename"] - else: - self.filename = None - self.headers = headers def __repr__(self): """Return printable representation.""" @@ -81,13 +70,34 @@ self.file.close() class StringField(str): - """ This class is basically a string with - a value attribute for compatibility with std lib cgi.py - """ +""" This class is basically a string with +added attributes for compatibility with std lib cgi.py. Basically, this +works the opposite of Field, as it stores its data in a string, but creates +a file on demand. Field creates a value on demand and stores data in a file. +""" +filename = None +headers = {} +ctype = "text/plain" +type_options = {} +disposition = None +disp_options = None + +# I wanted __init__(name, value) but that does not work (apparently, you +# cannot subclass str with a constructor that takes >1 argument) +def __init__(self, value): +'''Create StringField instance. You'll have to set name yourself.''' +str.__init__(self, value) +self.value = value - def __init__(self, str=""): - str.__init__(self, str) - self.value = self.__str__() +def __getattr__(self, name): +if name != 'file': +raise AttributeError, name +self.file = cStringIO.StringIO(self.value) +return self.file + +def __repr__(self): + """Return printable representation (to pass unit tests).""" + return "Field(%s, %s)" % (`self.name`, `self.value`) class FieldStorage: @@ -103,8 +113,7 @@ if req.args: pairs = parse_qsl(req.args, keep_blank_values) for pair in pairs: - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) if req.method != "POST": return @@ -123,9 +132,7 @@ if ctype.startswith("application/x-www-form-urlencoded"): pairs = parse_qsl(req.read(clen), keep_blank_values) for pair in pairs: - # TODO : isn't this a bit heavyweight just for form fields ? - file = cStringIO.StringIO(pair[1]) - self.list.append(Field(pair[0], file, "text/plain", {}, None, {})) + self.add_field(pair[0], pair[1]) return if not ctype.startswith("multipart/"): @@ -205,33 +212,44 @@ else: name = None + # create a file object # is this a file? if disp_options.has_key("filename"): if file_callback and callable(file_callback): file = file_callback(disp_options["filename"]) else: - file = self.make_file() + file = tempfile.TemporaryFile("w+b") else: if field_callback and callable(field_callback): file = field_callback() else: - file = self.make_field() + file = cStringIO.StringIO() # read it in - end_of_stream = self.read_to_boundary(req, boundary, file) + self.read_to_boundary(req, boundary, file) f
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
Just one comment. It seems like it would be better just to make add_method inline, since everything else in __init__ is, and it never gets called from anywhere else. s/_method/_field/g The thing I had in mind when I built the add_field method was that I could subclass FieldStorage and give the page handler more control over what is happening. This was convenient for the thing that got this all started: Posting large files. In fact, I intended add_field to replace the make_file function. What I had in mind is that you override the add_field function to get context information (e.g. user name, upload directory and such) before the file upload comes in, so that you can put the upload where it belongs. With the two callbacks, you can create the same effect, so it was no longer nessecary. Also, the add_field is referred at least twice and I don't like duplicating code. -- Mike Looijmans Philips Natlab / Topic Automation
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
I am still not convinced we need an index. I'd like to see some concrete proof that we're not engaging in "overoptimization" here - is this really a bottleneck for anyone? Well, I need to get some real work done today, but I will do some benchmarking at some point. It's not like there's a big rush to complete 3.3.0 this week. I do like that Mike's code simplifies a number of the the FS methods. It just looks more elegant. "More elegant" is the best reason I can think of for refactoring it. Heck, even if my implementation can be proved to be slower than the original, I still like this one better because of sheer code size and readability. I don't think that you'll notice much of a performance difference, but I like the fact that it: - You get the same object instance: assert (form['key'] is form['key']) - Does not create a file object for every field - Getting form['key'] a second time is cheap. -- Mike Looijmans Philips Natlab / Topic Automation
Re: Various musings about the request URL / URI / whatever
Daniel J. Popowich wrote: By the Host header. I've been looking into this issue tonight and think I have the answers (but it's really late, so I'll save the gory details for tomorrow). In brief: typically, req.hostname is set from the Host header and, in fact, when I telnet to apache and issue a GET by hand, if I don't send the Host header, apache barfs with a 400, Bad Request, response. (apache 2.0.54, debian testing) It will only do that if you claim to be a HTTP/1.1 client. If you send GET / HTTP/1.0 it will not complain about the host header. Sending: GET / HTTP/1.1 will get you a 400 response, because you MUST supply it (says RFC 2068, and whatever superseded that one). There is more you must do to be able to call yourself HTTP/1.1 by the way, such as keep-alive connections and chunked encoding. -- Mike Looijmans Philips Natlab / Topic Automation
mod_python 3.2.6 testing on debian/colinux
Since I have a CoLinux instance on my machine here, I wanted to give it a go with mod_python as well (need a linux test environment for mod_python anyway). It's running a Debian distro, I have gcc-3.3 on it, as well as apache2-dev and python-dev (version 2.3). Can't get the tests to run though, even though Apache itself appears to run okay (I can get the manual pages from it). Here's an example session (running as root, also tried normal user): colinux:/home/mike/mod_python/test# python test.py * Testing LoadModule Creating config listen port: 32843 Starting Apache /usr/sbin/apache2 -k start -f /home/mike/mod_python/test/conf/test.conf F Stopping Apache... /usr/sbin/apache2 -k stop -f /home/mike/mod_python/test/conf/test.conf httpd (pid 4940?) not running Traceback (most recent call last): File "test.py", line 2033, in ? tr.run(suite()) File "/usr/lib/python2.3/unittest.py", line 658, in run test(result) File "/usr/lib/python2.3/unittest.py", line 389, in __call__ test(result) File "/usr/lib/python2.3/unittest.py", line 239, in __call__ self.tearDown() File "test.py", line 1812, in tearDown self.stopHttpd() File "test.py", line 331, in stopHttpd time.sleep(1) KeyboardInterrupt colinux:/home/mike/mod_python/test# Mike Looijmans Philips Natlab / Topic Automation Nicolas Lehuen wrote: Mike, I forward your +1 to python-dev since you only sent it to me. Regards, Nicolas -- Forwarded message -- From: Mike Looijmans <[EMAIL PROTECTED]> Date: 18 janv. 2006 08:21 Subject: Re: mod_python 3.2.6 (Final!) available for testing To: [EMAIL PROTECTED] +1 Windows XP Pro SP2 Apache 2.0.54 Python 2.4.2 -- Mike Looijmans Philips Natlab / Topic Automation Nicolas Lehuen wrote: You can fetch the Win32 version for Python 2.3 and Python 2.4 here : http://nicolas.lehuen.com/download/mod_python/
Re: please set up a mod_python core group
+1 :o) Seriously, I think Grisha's way is right - the three musketeers should decide based on the feedback they get. There's no substitute for running on other people's machines... -- Mike Looijmans Philips Natlab / Topic Automation Nicolas Lehuen wrote: Hi, It's OK for me ! Regards, Nicolas 2006/1/19, Gregory (Grisha) Trubetskoy <[EMAIL PROTECTED]>: Thanks Roy. Very timely, since 3.2.6 is (so far) going to be a final/stable release. I propose that for starters those people are: me (I'm also in the Apache HTTP Server PMC) Jim Gallacher Nicolas Lehuen Graham Dumpleton
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
Two comments: 1. (bug): The acquire() call should be *outside* the try ... finally block. You do not want to release a lock that you did not aquire. 2. (optimization): If you are not planning to change the path, you do not have to aquire the lock. Aquiring a lock is quite expensive, so the double-check will have much less impact on performance. if config.has_key("PythonPath"): pathstring = config["PythonPath"] if not _path_cache.has_key(pathstring): # acquire must be done outside the try block _path_cache_lock.acquire() try: # double-check, because two threads might come # to the same conclusion up until here. if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() Mike Looijmans Philips Natlab / Topic Automation Graham Dumpleton (JIRA) wrote: Problems with PythonPath directive. --- Key: MODPYTHON-114 URL: http://issues.apache.org/jira/browse/MODPYTHON-114 Project: mod_python Type: Bug Components: core Versions: 3.2, 3.1.4 Reporter: Graham Dumpleton The "PythonPath" setting can be used to define the value of the Python "sys.path" variable. It is this variable which defines the list of directories that Python will search in when looking for a module to be imported. Although the actual reassignment of "sys.path" by mod_python does not in itself present a problem due to assignment in Python being thread safe by definition, the context in which the assignment occurs is not thread safe and a race condition exists. This exists as the top level mod_python dispatcher will consult the existing value of "sys.path" and the last value for the "PythonPath" setting encountered before then making a decision to modify "sys.path". If multiple requests are being serviced in distinct threads within the context of the same interpreter instance, and each at the same time decide they want to modify the value of "sys.path", only one might ultimately succeed in setting it to the value it wants and any modification required by the other may be lost. In the worst case scenario, this can result in the importation of any subsequent modules within that request failing due to a required directory not being present in "sys.path". It is possible that this situation may resolve itself and go away on a subsequent request, but due to how mod_python caches the last value of "PythonPath" in a global variable this will be dependent on what other requests arrive. At the least, for mod_python to resolve the problem itself would require a request to arrive in the interim which targeted the URL which was not the last to cache its raw setting for "PythonPath". This only works though due to a further issue whereby alternate requests against URLs with different "PythonPath" settings will cause "sys.path" to be extended everytime if the "PythonPath" setting references "sys.path". This results in "sys.path" continually growing over time due to directories being added multiple times. The problematic code in apache.py is: if config.has_key("PythonPath"): # we want to do as little evaling as possible, # so we remember the path in un-evaled form and # compare it global _path pathstring = config["PythonPath"] if pathstring != _path: _path = pathstring newpath = eval(pathstring) if sys.path != newpath: sys.path[:] = newpath To fix the problems, the processing of PythonPath directive should be protected by a thread mutex lock and a dictionary should be used to store all seen values of PythonPath rather than a single variable. If a value for PythonPath has ever been seen, and not just from the last change, then no update would be necessary. if config.has_key("PythonPath"): try: _path_cache_lock.acquire() pathstring = config["PythonPath"] if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() There shouldn't really be a need to check whether the new path is different to the current value of sys.path as that scenario shouldn't occur at that point. The two parts of this problem were previously catalogued as ISSUE 15 and ISSUE 16 in my list of problems with the module importing system. The list can be found at: http://www.dscpl.com.au/articles/modpython-003.html
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
Wow - I wonder how many programs are leaking memory today because of this double-check issue. I've been using it myself for years (yes, also in Java)... Having read the article, I've come to the following conclusions: - Python indeed won't suffer from this problem. But I agree to keep on the safe side, the optimization is not worth the problems it might cause later. - I'll never use it again. (Unless I'm hacking away in assembly and have to write my own locking and threading code anyway...) -- Mike Looijmans Philips Natlab / Topic Automation Nicolas Lehuen wrote: Your optimisation is called "double-checked locking", and it is now known to be broken, especially on multiple-CPU systems. The problem exists in numerous languages. Here is an explanation of the problem : http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html Now, maybe Python is not touched by this problem due to its relatively poor support for multithreading. The presence of the GIL guarantees that only one thread of Python bytecode interpreter runs at a time, but still, in a multiple CPU environment, you can get bitten by local caching issues. As this thing is a bit hairy, I think we should first strive for correctness, then care about performance later. And no, I won't bother you with one more quote about premature this and the root of that ;). Regards, Nicolas -- Mike Looijmans Philips Natlab / Topic Automation
[Fwd: Re: Version 3.3 and beyond .......]
We probably want to defer until after 3.2.7 (final) is released to have any serious discussion about what should constitute version 3.3, but am still curious to know at this point where your interests in 3.3 lie. Is it simply to help finish up eliminating all these known issues/bugs or do you have other ideas in mind as to the direction of mod_python? Well, since you asked... I'd like to see more HTTP level helper functions. For one thing, I keep writing the same stuff for handling if-modified-since headers, accept, and similar construct. Having one library to handle these with would be very helpful. For example, a function to format a datetime tuple into a HTTP style date string (this one should *definitely* go into mod_python), as well as its counterpart. Note that this implementation is not correct as it does not handle timezones... ## weekdayname = ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun') monthname = (None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec') def httpdate(dt): """Return the current date and time formatted for a message header.""" now = time.time() year, month, day, hh, mm, ss, wd, y, z = dt s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % ( weekdayname[wd], day, monthname[month], year, hh, mm, ss) return s ## Another thing I use very often (when writing DB apps) is the if-modified-since mechanism. This helps reducing the load on the server since most clients will be visiting the same page a number of times in a session. Calling checkLastModified in my code quickly aborts rendering the page if it hasn't changed since the last visit. (This one is incomplete as well - should add support for etag and length extensions. The etag is very useful when things like username can change between pages.) ## def checkLastModified(req, lastmod): """If lastmod is non-zero (should be datetime object), outputs a Last-Modified header. If the browser supplied an identical If-Modified-Since header, this method raises apache.SERVER_RETURN with apache.HTTP_NOT_MODIFIED which informs the browser to use its cached version. """ if lastmod: ifmodsince = req.headers_in.get('if-modified-since', None) strlastmod = httpdate(lastmod.timetuple()) if ifmodsince: # TODO: We might want to do <= i.s.o. == if strlastmod == ifmodsince: raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED req.headers_out.add('Last-Modified', strlastmod) (note to self: Learn to hit "Reply All" instead of just "Reply" -- Mike Looijmans Philips Natlab / Topic Automation
Re: Last modified times
I've read through the documentation. I like the interface - just call ap_update_mtime for every object your response depends on (and another great feature: It starts with the mtime of the file, so that updates to the script also invalidate the cached response). Exposing those three functions would make a very good addition. A little Pythonic wrapping around ap_update_mtime to accept reasonable things like datetime objects would be helpful. ap_discard_request_body() might help resolving the issue with HEAD request writing out a body, maybe that change was related to http://issues.apache.org/jira/browse/MODPYTHON-71 but I'm not sure. Mike Looijmans Philips Natlab / Topic Automation Graham Dumpleton wrote: Mike, I have a feeling that Apache has ways of generating those date/time strings for last modified for you. Thus, the routine shouldn't be required. The real problem is that mod_python doesn't expose the methods which may actually be useful to you. Can you look through the Apache functions described in: http://162.105.203.19/apache-doc/138.htm and indicate whether if functions like: ap_set_etag() ap_update_mtime() ap_set_last_modified() you could more easily do the same thing. The request object already provides equivalent to ap_meets_conditions(), but not these others. I also find the function: ap_discard_request_body() to be interesting. That might be interesting to have access to as well. Graham Mike Looijmans wrote .. We probably want to defer until after 3.2.7 (final) is released to have any serious discussion about what should constitute version 3.3, but am still curious to know at this point where your interests in 3.3 lie. Is it simply to help finish up eliminating all these known issues/bugs or do you have other ideas in mind as to the direction of mod_python? Well, since you asked... I'd like to see more HTTP level helper functions. For one thing, I keep writing the same stuff for handling if-modified-since headers, accept, and similar construct. Having one library to handle these with would be very helpful. For example, a function to format a datetime tuple into a HTTP style date string (this one should *definitely* go into mod_python), as well as its counterpart. Note that this implementation is not correct as it does not handle timezones... ## weekdayname = ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun') monthname = (None, 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec') def httpdate(dt): """Return the current date and time formatted for a message header.""" now = time.time() year, month, day, hh, mm, ss, wd, y, z = dt s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % ( weekdayname[wd], day, monthname[month], year, hh, mm, ss) return s ## Another thing I use very often (when writing DB apps) is the if-modified-since mechanism. This helps reducing the load on the server since most clients will be visiting the same page a number of times in a session. Calling checkLastModified in my code quickly aborts rendering the page if it hasn't changed since the last visit. (This one is incomplete as well - should add support for etag and length extensions. The etag is very useful when things like username can change between pages.) ## def checkLastModified(req, lastmod): """If lastmod is non-zero (should be datetime object), outputs a Last-Modified header. If the browser supplied an identical If-Modified-Since header, this method raises apache.SERVER_RETURN with apache.HTTP_NOT_MODIFIED which informs the browser to use its cached version. """ if lastmod: ifmodsince = req.headers_in.get('if-modified-since', None) strlastmod = httpdate(lastmod.timetuple()) if ifmodsince: # TODO: We might want to do <= i.s.o. == if strlastmod == ifmodsince: raise apache.SERVER_RETURN, apache.HTTP_NOT_MODIFIED req.headers_out.add('Last-Modified', strlastmod) (note to self: Learn to hit "Reply All" instead of just "Reply" -- Mike Looijmans Philips Natlab / Topic Automation
Re: Refactoring of the test suite
Oh and if we are refactoring the tests, I want a "make tests" rule. I'm tired of doing: ./configure; make; sudo make install; make tests; DOH! cd test; python test.py. :) Make that "make check" (like autotools), to not confuse old-skool autoconfers like myself.
FieldStorage callback only works with FileType objects in 3.2.7
There is still trouble with the FieldStorage class. Looks like this one is not really fixed: http://issues.apache.org/jira/browse/MODPYTHON-40 See for a counter example: http://modpython.org/pipermail/mod_python/2006-February/020248.html (remove the "import fmt") It fails with the following traceback: Mod_python error: "PythonHandler mod_python.psp" Traceback (most recent call last): File "C:\Python24\Lib\site-packages\mod_python\apache.py", line 299, in HandlerDispatch result = object(req) File "C:\Python24\Lib\site-packages\mod_python\psp.py", line 302, in handler p.run() File "C:\Python24\Lib\site-packages\mod_python\psp.py", line 213, in run exec code in global_scope File "C:/source/archaic_web/upload.psp", line 47, in ? for afile in frm.getlist('archivefile'): File "C:\Python24\Lib\site-packages\mod_python\util.py", line 354, in getlist found.append(StringField(item.value)) File "C:\Python24\Lib\site-packages\mod_python\util.py", line 74, in __getattr__ value = self.file.read() AttributeError: FileCounter instance has no attribute 'read' The file is posted as file, and correctly writting into the FileCounter. However, the FieldStorage class forgets about this, and in util.py it tries to determine whether this is a file or not by looking whether it derives from FileType. Since any "home made" file object is not likely to derive from that class, util.py concludes that this was a simple field, and does a file.read() to fetch it into memory (which is what we were preventing in issue 40) A quick way to solve it is to use this code: http://issues.apache.org/jira/browse/MODPYTHON-93 this will remove the bad checks for field types alltogether. -- Mike Looijmans Philips Natlab / Topic Automation
Re: mod_python 3.2.8 available for testing
I get failing tests on WinXP-sp2, apache 2.0.54, python 2.4.2. This may be because the test suite is looking for things not there yet. Put the 3.2.8 binary distribution on my system and did a svn update to get the latest test suite. When running the tests, I got the following failures (can we do something about the readability of the output): == FAIL: test_req_auth_type (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 661, in test_req_auth_type self.fail(`rsp`) AssertionError: '\n\nMod_python error: "PythonAuthenHandler tests::req_auth_type"\n\nTraceback (most recent call last):\n\n File "C:\\Python24\\Lib\\site- packages\\mod_python\\apache.py", line 299, in HandlerDispatch\n result = object(req)\n\n File "C:/source/mod_python/test/htdocs/tests.py", line 609, in req _auth_type\nif req.auth_type() != "dummy":\n\nAttributeError: \'mp_request\' object has no attribute \'auth_type\'\n\n\n' == FAIL: test_req_construct_url (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 753, in test_req_construct_url self.fail("construct_url") AssertionError: construct_url == FAIL: test_req_handler (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 1009, in test_req_handler self.fail(`rsp`) AssertionError: '\n\nMod_python error: "PythonFixupHandler tests::req_handler"\n\nTraceback (most recent call last):\n\n File "C:\\Python24\\Lib\\site-pac kages\\mod_python\\apache.py", line 299, in HandlerDispatch\nresult = object(req)\n\n File "C:/source/mod_python/test/htdocs/tests.py", line 770, in req_ha ndler\nreq.handler = "mod_python"\n\nTypeError: attribute \'handler\' of \'mp_request\' objects is not writable\n\n\n' == FAIL: test_trans (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 1402, in test_trans self.fail(`rsp`) AssertionError: ' #\r\n # \r\n # Licensed under the Apache License, Version 2.0 (the "License"); you\r\n # may not use this file except in compliance with the L icense. You\r\n # may obtain a copy of the License at\r\n #\r\n # http://www.apache.org/licenses/LICENSE-2.0\r\n #\r\n # Unless required by applicable law or agreed to in writing, software\r\n # distributed under the License is distributed on an "AS IS" BASIS,\r\n # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, e ither express or\r\n # implied. See the License for the specific language governing\r\n # permissions and limitations under the License.\r\n #\r\n #\r\n # $Id: tests.py 379638 2006-02-22 00:41:51Z jgallacher $\r\n #\r\n\r\n# mod_python tests\r\n\r\nfrom __future__ import generators\r\nfrom mod_python.python22 import * \r\n\r\nfrom mod_python import apache\r\nimport unittest\r\nimport re\r\nimport time\r\nimport os\r\nimport cStringIO\r\n\r\n# This is used for mod_python.publi sher security tests\r\n_SECRET_PASSWORD = \'root\'\r\n__ANSWER = 42\r\n\r\nclass SimpleTestCase(unittest.TestCase):\r\n\r\ndef __init__(self, methodName, re q):\r\nunittest.TestCase.__init__(self, methodName)\r\n self.req = req\r\n\r\ndef test_apache_log_error(self):\r\n\r\n s = self.req.se rver\r\napache.log_error("Testing apache.log_error():", apache.APLOG_INFO, s)\r\napache.log_error("xEMERGx", apache.APLOG_EMERG, s)\r\na pache.log_error("xALERTx", apache.APLOG_ALERT, s)\r\n apache.log_error("xCRITx", apache.APLOG_CRIT, s)\r\n apache.log_error("xERRx", apache.APLOG_ ERR, s)\r\napache.log_error("xWARNINGx", apache.APLOG_WARNING, s)\r\napache.log_error("xNOTICEx", apache.APLOG_NOTICE, s)\r\n apache.log_ error("xINFOx", apache.APLOG_INFO, s)\r\n apache.log_error("xDEBUGx", apache.APLOG_DEBUG, s)\r\n\r\n# see what\'s in the log now\r\nf = o pen("%s/logs/error_log" % apache.server_root())\r\n# for some reason re doesn\'t like \\n, why?\r\nimport string\r\n log = "".join(map(st ring.strip, f.readlines()))\r\nf.close()\r\n\r\nif not re.search("xEMERGx.*xALERTx.*xCRITx.*xERRx.*xWARNINGx.*xNOTICEx.*xINFOx.*xDEBUGx", log):\ r\nself.fail("Could not find test messages in error_log")\r\n\r\n\r\ndef test_apache_table(self):\r\n\r\nlog = self.req.log_ error\r\n\r\nlog("Testing table object.")\r\n\r\n# tests borrowed from Python test suite for d
Re: mod_python 3.2.8 available for testing
Yup. Much better with that one: ## Ran 6 tests in 91.047s OK ## You have my +1 (even though you weren't waiting for it) And my merged util.py also passes the test suite, which is what I really wanted to know. -- Mike Looijmans Philips Natlab / Topic Automation Nicolas Lehuen wrote: Yup, the SVN trunk currently contains tests that fail on 3.2.8 (Graham added some features and their unit test in the trunk), but you can test the 3.2.8 release against the 3.2.8 test suite (available from https://svn.apache.org/repos/asf/httpd/mod_python/tags/release-3-2-8 ) and you won't see any problems. Regards, Nicolas 2006/2/22, Mike Looijmans <[EMAIL PROTECTED]>: I get failing tests on WinXP-sp2, apache 2.0.54, python 2.4.2. This may be because the test suite is looking for things not there yet. Put the 3.2.8 binary distribution on my system and did a svn update to get the latest test suite. When running the tests, I got the following failures (can we do something about the readability of the output): == FAIL: test_req_auth_type (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 661, in test_req_auth_type self.fail(`rsp`) AssertionError: '\n\nMod_python error: "PythonAuthenHandler tests::req_auth_type"\n\nTraceback (most recent call last):\n\n File "C:\\Python24\\Lib\\site- packages\\mod_python\\apache.py", line 299, in HandlerDispatch\n result = object(req)\n\n File "C:/source/mod_python/test/htdocs/tests.py", line 609, in req _auth_type\nif req.auth_type() != "dummy":\n\nAttributeError: \'mp_request\' object has no attribute \'auth_type\'\n\n\n' == FAIL: test_req_construct_url (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 753, in test_req_construct_url self.fail("construct_url") AssertionError: construct_url == FAIL: test_req_handler (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 1009, in test_req_handler self.fail(`rsp`) AssertionError: '\n\nMod_python error: "PythonFixupHandler tests::req_handler"\n\nTraceback (most recent call last):\n\n File "C:\\Python24\\Lib\\site-pac kages\\mod_python\\apache.py", line 299, in HandlerDispatch\nresult = object(req)\n\n File "C:/source/mod_python/test/htdocs/tests.py", line 770, in req_ha ndler\nreq.handler = "mod_python"\n\nTypeError: attribute \'handler\' of \'mp_request\' objects is not writable\n\n\n' == FAIL: test_trans (__main__.PerRequestTestCase) -- Traceback (most recent call last): File "test.py", line 1402, in test_trans self.fail(`rsp`) AssertionError: ' #\r\n # \r\n # Licensed under the Apache License, Version 2.0 (the "License"); you\r\n # may not use this file except in compliance with the L icense. You\r\n # may obtain a copy of the License at\r\n #\r\n # http://www.apache.org/licenses/LICENSE-2.0\r\n #\r\n # Unless required by applicable law or agreed to in writing, software\r\n # distributed under the License is distributed on an "AS IS" BASIS,\r\n # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, e ither express or\r\n # implied. See the License for the specific language governing\r\n # permissions and limitations under the License.\r\n #\r\n #\r\n # $Id: tests.py 379638 2006-02-22 00:41:51Z jgallacher $\r\n #\r\n\r\n# mod_python tests\r\n\r\nfrom __future__ import generators\r\nfrom mod_python.python22 import * \r\n\r\nfrom mod_python import apache\r\nimport unittest\r\nimport re\r\nimport time\r\nimport os\r\nimport cStringIO\r\n\r\n# This is used for mod_python.publi sher security tests\r\n_SECRET_PASSWORD = \'root\'\r\n__ANSWER = 42\r\n\r\nclass SimpleTestCase(unittest.TestCase):\r\n\r\ndef __init__(self, methodName, re q):\r\nunittest.TestCase.__init__(self, methodName)\r\n self.req = req\r\n\r\ndef test_apache_log_error(self):\r\n\r\n s = self.req.se rver\r\napache.log_error("Testing apache.log_error():", apache.APLOG_INFO, s)\r\napache.log_error("xEMERGx", apache.APLOG_EMERG, s)\r\na pache.log_error("xALERTx", apache.APLOG_ALERT, s)\r\n apache.log_error("xCRITx", apache.APLOG_CRIT, s)\r\n apache.log_error("xERRx&quo
FieldStorage improvements as in MODPYTHON-93
I'm currently using a 'private' patch, but I would really like to see this one incorporated in some release: http://issues.apache.org/jira/browse/MODPYTHON-93 The patch is there, all unit tests pass, and I've been torturing it myself for a while now. What else can (must?) I do to get it integrated? (I really need the file upload to work decently, and others may like the improved performance) -- Mike Looijmans Philips Natlab / Topic Automation
Re: FieldStorage improvements as in MODPYTHON-93
Mike Looijmans Philips Natlab / Topic Automation Jim Gallacher wrote: Mike Looijmans wrote: I'm currently using a 'private' patch, but I would really like to see this one incorporated in some release: http://issues.apache.org/jira/browse/MODPYTHON-93 The patch is there, all unit tests pass, and I've been torturing it myself for a while now. What else can (must?) I do to get it integrated? Nothing. It's on my todo list, and if there are no objections I'll commit it to trunk. I'm now very busy doing 'nothing' to get that to happen :-) On the other hand, if you want it backported to 3.2.x for a bugfix release, you should do a little agitation on this list. ;) The reason I want it committed is so that I can use file-like objects (e.g. TarFile) as targets for a file POST. With the current implementation, anything you return from the make_file callback must be a real file of type(FileType) otherwise FieldStorage will call 'read' on it and attempt to fetch the whole file into a string... (I really need the file upload to work decently, and others may like the improved performance) Just curious - how much of a performance boost are you seeing? I never did any measurements - but if I'm bored I'll do some benchmarks... (as in: lies, damn lies, statistics, delivery dates, and benchmarks). I don't know what the impact of stuffing things into a cStringIO object and reading them back as string is... Mike
Re: mod_python roadmap
... Any other issues for a 3.2.9 release? (This is where Mike L. should advocate for MODPYTHON-93 - Improved FieldStorage. ;) ) Eh, yeah, I have a suggestion: MODPYTHON-93 - Improved FieldStorage - Stop putting simple key/value pairs into StringIO objects, put them simple str objects - Use a dictionary for fields - Allow any file-like object ('write' method) to be used in a make_file callback (all the coding is already done) Mike Looijmans Philips Natlab / Topic Automation
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
I was checking the documentation with respect to this patch and I noticed a couple of things. The FieldStorage constructor has 2 undocumented parameters, file_callback=None, field_callback=None. These were added in r160259 as part of Barry Pearce's fix for MODPYTHON-40. In the same patch, the make_file and make_field methods were added. These allow FieldStorage to be subclassed and provide an alternative to using the callbacks. In Mike's modpython325_util_py_dict.patch, the make_field and make_file methods have be removed, with the reasoning that the callbacks should be sufficient for controlling the upload. We never had a discussion of the merits of the 2 approaches - subclassing vs passing the callbacks in the FieldStorage constructor. Should we perhaps allow both, and reinstate make_field and make_file, or are the callbacks sufficient? The funny thing is that subclassing was my first way of implementing it. Someone beat me to it, so I adopted the callbacks. The structure of FieldStorage requires either or both methods. The __init__ constructor is doing all the parsing already, so something Pythonic like injecting the methods later will not work: form = util.FieldStorage() form.make_file = myMakeFileFunction form.parse(req) Because of backward compatibility, my vote would be on the callbacks. Overriding looks more organized, more structured programming, but when you start using the FieldStorage in multiple handlers, providing callback functions is much handier for everyday use (otherwise, you'll end up with a dozen different FieldStorage classes or mixins for which you have to invent names). Further infomation on the callbacks, including some nice examples, can be found in the python-dev archives. See http://article.gmane.org/gmane.comp.apache.mod-python.devel/760 I'll update the docs as part of the fix for this issue. Good idea - and a good page too, thanks Barry.
Re: mod_python 3.3.0-dev-20060321 available for testing
Nicolas, could you make a Win32 binary for us poor people? -- Mike Looijmans Philips Natlab / Topic Automation
Re: mod_python 3.3.0-dev-20060321 available for testing
And in addition to running the unit tests, try running your actual web site with this release. That usually manages to bring up things the developers never thought of - so that we can extend the unit tests with those cases as well. I can imagine that some code may depend on some specific behaviour of FieldStorage for example. -- Mike Looijmans Philips Natlab / Topic Automation Jim Gallacher wrote: mod_python-3.3.0-dev-20060321 is available for testing. We are asking the mod_python development community for assistance in testing the current development branch. Hopefully this will allow us to catch new bugs or regressions early, and when we are ready for the next release the beta cycle will be much shorter. This snapshot addresses 33 issues since 3.2.7 was released, including apache 2.2 support and the introduction of a new module importer. The files are (temporarily) available here: http://people.apache.org/~jgallacher/mod_python/dist/ Please download it, then do the usual $ ./configure --with-apxs=/wherever/it/is $ make $ (su) # make install Then (as non-root user!) $ make check or if you prefer to run the tests the old way: $ cd test $ python test.py Make a note of any failing tests. If all the tests pass, give the new module importer a workout by uncommenting line 328 in test/test.py: #PythonOption('mod_python.future.importer *'), and then re-run the tests. $ make check And see if any tests fail. If they pass, send a +1 to the list, if they fail, send the details (the versions of OS, Python and Apache, the test output, and suggestions, if any). Thank you for your assistance, Jim Gallacher
Re: mod_python 3.3.0-dev-20060321 available for testing
Jim Gallacher wrote: Mike Looijmans wrote: And in addition to running the unit tests, try running your actual web site with this release. That usually manages to bring up things the developers never thought of - so that we can extend the unit tests with those cases as well. I can imagine that some code may depend on some specific behaviour of FieldStorage for example. As nice as it would be to get some real-world feedback, let's not forget this is a *development* snapshot, and should not be considered stable for production use. Test your application on your website development machine if you can - the more testing the better. I'm sure that is what Mike meant, but I just want to make it clear that we are not suggesting this is a release candidate for production use. From my post that intention wasn't clear, so thanks Jim for pointing that out. Want I wanted to write is to run your site's code on it, but not (necessarily) your production site. Most of us will have some development system where they check things out and/or do development and testing before putting them into the field.
Grouping tests (was: Latest tests)
What I have been doing in a totally unrelated Python project is to create test groups simply by putting them into separate modules. The main test module test.py looks like this: ## (test.py) import unittest from test_something import * from test_someother import * from test_yetmore import * if __name__ == '__main__': unittest.main(module=__import__(__name__)) ## This works because unittest takes all 'things' that start with 'test' in the provided module and runs them. So anything we bring into our namespace gets run. This also makes it possible to import tests from other projects, and share these tests between projects. The other test_ modules look much alike: ## (test_something.py) import unittest import test_peer class test04MultiPeerSystem(test_peer.BaseTestRealThing): def test03DiscoveryC08B20(self): "Multiple clients w/ discovery: 8 peers, 20 blocks each" self.runDiscoveryTest(nclients = 8, nblocks=20) if __name__ == '__main__': unittest.main(module=__import__(__name__)) ## This makes it very easy to handle test subsets, and run single test suites. Just run $python test.py to run ALL the tests. To run just a single set, run $python test_something.py And to run a single test, either of these will do: $python test_something.py test04MultiPeerSystem $python test.py test04MultiPeerSystem The real power shows when you want to run 4 or 5 test sets, and/or only parts of some test sets. Just create a new "main" test unit that imports the desired ones, and you're set: ## (test_few.py) import unittest from test_something import * from test_someother import TestOnlyThis if __name__ == '__main__': unittest.main(module=__import__(__name__)) ## Because some tests take very long to run (in my vocabulary, "long" means more than a second), this saves me a lot of time when working on a part of a big project, where I don't need to run all tests all the time. -- Mike Looijmans Philips Natlab / Topic Automation Jim Gallacher wrote: ... I've been playing with some ideas for a new test framework, using a subclass of unittest.TestLoader to find and configure tests. I want to play around with it for another day or so before sharing but at this point I'm pretty confident it'll work. Creating a new set of tests could be as simple as: testsuites/core/simpletest.py from testlib import VirtualHostTest class MyTests(VirtualHostTest): def test_hello_world(self): rsp = self.send_request() self.failUnless(rsp == 'test ok') def test_goodbye_world(self): rsp = self.send_request() self.failUnless(rsp == 'test ok') htdocs/testhandlers/core/simpletest.py -- from mod_python import def test_hello_world(req): req.write('test ok') return apache.OK def test_goodbye_world(req): req.write('test ok') return apache.OK $ python testrunner.py Things like virtual host names and handler directives required for configuration or send_request() are automatically derived from the test class and test method names. It will still be possible to provide custom apache configuration directives in a manner similar to that which we currently use, but for most tests this will not be required. Usage would look something like this: Run all the tests $ python testrunner.py Run one test $ python testrunner.py -t core.simpletest.MyTests.test_hello_world Run a group of tests (this would load the TestCase subclasses in testsuites/sessions/filesession.py): $ python testrunner.py -t sessions.filesession Jim
Re: GET request content and mod_python.publisher/psp.
My question is, should mod_python.publisher and mod_python.psp be enhanced and call req.discard_request_body() for a GET request to avoid the posibilities of any problems arising due to a client sending content for a GET request? -1 on that particular way of implementing it. If the GET request has a body, that body probably serves some purpose. The right thing to do for any handler that does not know how to handle the request is to return a 'bad request' error to the client. Just throwing away what is not understood is not very nice to developers and users - you'll get unexpected behaviour because the server is only handling a part of the request. The trouble here is of course that publisher or PSP cannot tell forehand that the handler will read the body data. So the only way to determine this is to have the handler handle the request, and after that, check if it did read all of the request. If not, you're too late to report this to the client, because the headers have already been sent out. Putting some message in an error log that no-one will ever read (in particular not the one who caused that problem) does not make sense either. To fix this, the handler should somehow advertise its capability to read the body. I guess you can't really solve the problem. Which is the lesser evil? Mike.
Re: FieldStorage and multiline headers in multipart/form.
Short answer: Yes, multiline headers are allowed there, and yes, mod_python will fail to read them properly. Longer answer: I checked throught the util.py code, and nope, the fix is not in there. I browsed through the appropriate RFC documents, in particular RFC1521 http://www.faqs.org/rfcs/rfc1521.html This clearly states that RFC822 headers are to be used, which means that those multiline headers are indeed allowed. The 1521 document even contains samples with multiline headers. Just a thought: Can we (re)use the rfc822 mime parser that's already built-in in Python to do the work for us? -- Mike Looijmans Philips Natlab / Topic Automation Graham Dumpleton wrote: With FieldStorage being discussed on main user mailing list, came across this old post of the mailing list: http://www.modpython.org/pipermail/mod_python/2001-November/012256.html What it is saying is that some HTTP clients use multi line headers in sections of multipart/form postings and that mod_python doesn't handle this. Looking at FieldStorage code, which I don't grok right at this minute because of an intensive coding frenzy today on the back of not enough sleep last night, I can't see that it has ever been modified to accomodate such multiline headers if indeed it needs to. Anyone who is more intimate with FieldStorage code want to have a better look at validity of original mailing list post and whether multiline headers are legitimate and whether there indeed could be a problem in mod_python. Graham
Re: wepy
import sys sys.stdout.write(open('penguin.png', 'rb').read()) NO. NO. NO. sys.stdout is a global variable. If you want sys.stdout.write() to end up in the user's terminal, your web server will be able to either serve only one request at a time, or must be forced into running separate processes for each concurrent request (which will not happen on windows and many other supported platforms). We made that mistake with cgipython, and we won't make it again. Learn from the mistakes of others, as you do not have the time to make them all yourself. Mike
Re: wepy
Your code is perfectly thread safe, yes. But the assignment to sys.stdout IS NOT. There is only one sys.stdout, and all threads use the same one. Example (two threads): ##Thread 1: - initializes request from client 1 - sets sys.stdout to StdOut(request ...) object - writes "hello1 " to sys.stdout ##Thread 2 - initializes request from client 2 - sets sys.stdout to StdOut(request ...) object - writes "hello2 " to sys.stdout ##Thread 1 - writes "world1" to sys.stdout - terminates, and sends sys.stdout.buffer to client ##Thread 2 - writes "world2" to sys.stdout - terminates, and sends sys.stdout.buffer to client Now look at what will happen with your code and a threading apache: Client 1 will receive "hello2 world1" Client 2 will receive "hello2 world1world2" Explanation: ##Thread 1: - initializes request from client 1 - sets sys.stdout to StdOut(request ...) object - writes "hello1 " to sys.stdout (sys.stdout.buffer == "hello1 ") ##Thread 2 - initializes request from client 2 - sets sys.stdout to StdOut(request ...) object (and thus replaces the buffer that thread 1 put there, its contents are lost) - writes "hello2 " to sys.stdout (sys.stdout.buffer == "hello2 ") ##Thread 1 - writes "world1" to sys.stdout (sys.stdout.buffer == "hello2 world1") - terminates, and sends sys.stdout.buffer to client ##Thread 2 - writes "world2" to sys.stdout (sys.stdout.buffer == "hello2 world1world2") - terminates, and sends sys.stdout.buffer to client Mike Looijmans Philips Natlab / Topic Automation Fırat KÜÇÜK wrote: Mike Looijmans yazmış: Yes, the problem is not whatever class or method you use to send the output to the client, but the problem is the use of the global sys.stdout variable. The is only one such object in your system. If there are two threads each handling a request, there is no telling to which one the sys.stdout.write or print statement will send its output. To prevent this, you'd have to mutex access to it, and leads to being able to handle only one request at a time. On many unix flavors, apache defaults to 'forking', which runs each request in a separate process. In these circumstances, you will not detect this problem. On other platforms, or if you modify the httpd.conf file, apache uses threading or even a mix of threading and forking. Important note: Since print sends to sys.stdout, using print is just as bad a method to send output to the client as sys.stdout.write. The only solution is to use the context in your handler, i.e. WRITE("hello world\n") or the more obvious (like in publisher and PSP): req.write("Hello world\n") Anything that makes use of globals to send back data to the client will not work, so "wepi.write" is also out of the question (if wepi is a module). -- Mike Looijmans Philips Natlab / Topic Automation Fırat KÜÇÜK wrote: import sys sys.stdout.write(open('penguin.png', 'rb').read()) NO. NO. NO. sys.stdout is a global variable. If you want sys.stdout.write() to end up in the user's terminal, your web server will be able to either serve only one request at a time, or must be forced into running separate processes for each concurrent request (which will not happen on windows and many other supported platforms). We made that mistake with cgipython, and we won't make it again. Learn from the mistakes of others, as you do not have the time to make them all yourself. Mike Hi, but i replaced sys.stdout with a StdOut class. is it problem too? Fırat KÜÇÜK Hi, ###[StdOut class]### class StdOut: "replaces sys.stdout" buffer = "" # --[overidden methods]- ### [__init__ method]--- def __init__(self, req, headers_object): self.req = req self.headers = headers_object # -[public methods]- ### [write method]--- def write(self, text): self.buffer += str(text) this is my sys.stdout replacement class and write method use buffer. and finally after execution of .py file # send buffer req.write(sys.stdout.buffer) I think this is thread safe method for sending data. print or sys.stdout.write indirectly use req.write method. # send buffer req.write(sys.stdout.buffer)
Re: mod_python 3.2.9-rc2 available for testing
Having written most of the issue "93" code, here's my opinion: * How much non-compatibility is acceptable in a patch release? None. Though it hurts my personal feelings that my patch did manage to break something (who imagined anyone trying to hack data into the FS object?), we cannot break other people's software with a mere patch. I don't have a clue what the "trac" software is, but I imagine that even for this particular piece of software, I may be able to come up with some form of emulation (e.g. using __setattr__ hooks) that still keeps it working with the patch. The question is of course, where does it stop? Note that the "old" code still contains a bug that the patch solves. The problem is that the old code does not work correctly if the objects created in a make_file callback are not real files created with file(). * How are applications supposed to perform write operations on a FieldStorage, in 3.3 and the future? Since we claim that FieldStorage behaves like a dictionary, the obvious syntax would be: form['mykey'] = 'value' This would require a __setitem__ method which should look something like this (with some extra code to handle lists): def __setitem__(self, key, value): if self.dictionary is None: # Create self.dictionary as in __getitem__ self.dictionary[key] = StringField(value)
Adding to FieldStorage
* How are applications supposed to perform write operations on a FieldStorage, in 3.3 and the future? Since we claim that FieldStorage behaves like a dictionary, the obvious syntax would be: form['mykey'] = 'value' This would require a __setitem__ method which should look something like this (with some extra code to handle lists): def __setitem__(self, key, value): if self.dictionary is None: # Create self.dictionary as in __getitem__ self.dictionary[key] = StringField(value) Trac also appends to the FieldStorage list attribute, which complicates things a little further. The '93' code also adds the add_field() method. Although this is not documented, there is nothing to indicate that it is a private method either. Calling add_field on a FieldStorage instance will not likely give the results a user expects, so we need to give some attention to that as well. I initially intended add_field as a callback routine, which was later on replaced with the callback. The method is not really neccesary, but quite convenient when subclassing. All the processing takes place in __init__ which makes it impossible to just replace the method at runtime, which I had in mind at first. I think we should support the dictionary syntax, e.g. form['key']='value' assert form['key'] == 'value' The dict syntax will replace all 'key' fields with the new value. The add_field method can be reworked to act as one would expect, e.g.: assert form['key'] == 'value' form.add_field('key', 'value2') assert form['key'] == ['value','value2'] form.add_field('key', 'value1') assert form['key'] == ['value','value2','value1'] Maybe we can come up with a better name than add_field (append?) as well.
Re: mod_python 3.2.9-rc2 available for testing
So do you think we can release 3.2.9 with the old 3.2.8 code, or should this block the release until we have a correct fix? I'm hoping we can do a 3.3 release in October or November, FYI. I don't think it is worth trying to work on a fix that makes new FieldStorage code work with Trac. Just go with 3.2.8 FieldStorage code in 3.2.9. In 3.3 then use new code with understanding that it will break Trac. I see this as acceptable as version of Trac in subversion trunk does away with all the FieldStorage fiddles and instead they base it on a WSGI wrapper which I presume therefore avoids use of FieldStorage. We just need to ensure that this newer version of Trac is released before we release mod_python 3.3 and that we notify them that older versions of Trac will not work with mod_python 3.3 and people will need to use their most latest version of Trac instead. You have my +1 on that. Let's keep the 3.2.8 implementation for now, and do the rework in 3.3 in cooperation with the Trac people. Mike.
Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
As I recall, dropping the current Trac version for 3.3 is what we wanted to do. In addition, there was talk about implementing a __setitem__ method as well to support manual insertion of fields, so that FieldStorage becomes a real dictionary-like object. The add_field method was to be reworked so that it could be publicly available. (side thought: Allowing population from code rather than only from a Request object might allow FieldStorage to be used in test code) -- Mike Looijmans Philips Natlab / Topic Automation Graham Dumpleton (JIRA) wrote: [ http://issues.apache.org/jira/browse/MODPYTHON-93?page=comments#action_12433818 ] Graham Dumpleton commented on MODPYTHON-93: --- Have we decided that we will not try and be compatible with Trac and expect people using mod_python 3.3 to use the next unreleased version of Trac? This next version of Trac works via a WSGI gateway and thus doesn't use FieldStorage and thus doesn't have the problems seen here. Can we mark the original issue as resolved for 3.3? Improve util.FieldStorage efficiency Key: MODPYTHON-93 URL: http://issues.apache.org/jira/browse/MODPYTHON-93 Project: mod_python Issue Type: Improvement Components: core Affects Versions: 3.2.7 Reporter: Jim Gallacher Assigned To: Jim Gallacher Priority: Minor Fix For: 3.3 Attachments: modpython325_util_py_dict.patch Form fields are saved as a list in a FieldStorage class instance. The class implements a __getitem__ method to provide dict-like behaviour. This method iterates over the complete list for every call to __getitem__. Applications that need to access all the fields when processing the form will show O(n^2) behaviour where n == the number of form fields. This overhead could be avoided by creating a dict (to use as an index) when the FieldStorage instance is created. Mike Looijmans has been investigating StringField and Field as well. It is probably reasonable to include information on his work in this issue as well, so that we can consider all of these efficiency issues in toto.
Re: Status of mod_python 3.3.
Graham Dumpleton wrote: ... MODPYTHON-93 Improve util.FieldStorage efficiency. This was actually marked as resolved but reopened because it was discovered that changes meant that Trac <=0.9.6 would no longer work. The changes were also backed out of mod_python 3.2.X branch and not released in 3.2.10. At this point I believe we have agreed that code in 3.3 would be left as is and people would need to use Trac >=0.10, which has now been release, with mod_python 3.3 or later. There was comments as to whether util.FieldStorage needs to have more dictionary like access, but at this point I believe we should mark this issue as resolved and if people want dictionary like access, they can open a separate JIRA issue for that and we deal with it in a future release. I give it a +1 to mark it resolved. There's more than enough good stuff in the 93 fix (such as being able to upload large amounts of data to file-like objects) to justify breaking just one hack (which is easy to fix on the Trac side anyway). It would be logical to upgrade both the mod_python and trac releases on the machine simultaneously, so I doubt there will be any real hits in the field. Mike.
Re: [VOTE] does mod_python want to be a TLP
Apache includes a feather in its logo, and Python is associated with a snake. "Quetzalcoatl" means "feathered snake" and does not appear to be used by any other software project. Which I can fully understand, because "Quetzalcoatl" is harder to pronounce than the 16 character password for the mainframe that was generated using /dev/random. I liked "Asphyxia" though. Makes a nice password too. Mike.
Re: [jira] Commented: (MODPYTHON-222) Support for chunked transfer encoding on request content.
For my tape project I used "None" as valid len argument to indicate reading the next available tape block, regardless of size (underlying layer figured out the tape block size). This worked well together with existing file type APIs. So that would make the API: req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of the stream. Blocks until the requested amount has been read. req.read() - reads all data from the stream (bad idea), blocks until EOS. req.read(None) - reads the next chunk of data, blocks if no data available. Returns 0 size if at EOS. Mike Looijmans Philips Natlab / Topic Automation M Willson (JIRA) wrote: [ https://issues.apache.org/jira/browse/MODPYTHON-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507699 ] M Willson commented on MODPYTHON-222: - If not possible in full generality then I'd be happy with a special separate read method for getting what is in the input buffer so far, with 'read' just blocking until the chunked upload is complete. Would this be possible? Or if we could specify a handler in python which is called back with each new chunk of data received, that would give the ultimate flexibility... Support for chunked transfer encoding on request content. - Key: MODPYTHON-222 URL: https://issues.apache.org/jira/browse/MODPYTHON-222 Project: mod_python Issue Type: New Feature Components: core Affects Versions: 3.3.1 Reporter: Graham Dumpleton It is currently not possible to use chunked transfer encoding on request content delivered to a mod_python request handler. The use of chunked transfer encoding is explicitly blocked in C code by: rc = ap_setup_client_block(self->request_rec, REQUEST_CHUNKED_ERROR); To allow chunked transfer encoding instead of REQUEST_CHUNKED_ERROR it would be necessary to supply REQUEST_CHUNKED_DECHUNK. Problem is that it isn't that simple. First off, the problems associated with MODPYTHON-212 have to be fixed with code being able to cope with there being no content length. The next issue is that req.read() method is currently documented as behaving as: If the len argument is negative or omitted, reads all data given by the client. This means that can't have req.read() with no arguments mean give me everything that is currently available in input buffers as everyone currently expects it to return everything sent by client. Thus, to be able to process streaming data one would have to supply an amount of data that one wants to read. The code for that though will always try to ensure that that exact amount of data is read and will block if not enough and not end of input. A handler though may not want it to block and be happy with just getting what is read and only expect it to block if nothing currently available. In other words, the current specification for how req.read() behaves is incompatible with what would be required to support chunked transfer encoding on request content. Not sure how this conflict can be resolved.
[jira] Updated: (MODPYTHON-93) Improve util.FieldStorage efficiency
[ http://issues.apache.org/jira/browse/MODPYTHON-93?page=all ] Mike Looijmans updated MODPYTHON-93: Attachment: modpython325_util_py_dict.patch What it does: - Simplifies the creation of StringField objects. This was already marked as a TODO in the 3.2.5b code. - Does not create a file object (cStringIO) for each and every field. - FieldStorage.get() will always return the same instance(s) for any given name. - FieldStorage.get() is very cheap now (does not create new objects, which is expensive in Python) - use a dictionary-on-demand. - Lots of code removed for this one, and a few lines added. - items() function, which, contrary to common belief, preserves argument ordering. > Improve util.FieldStorage efficiency > > > Key: MODPYTHON-93 > URL: http://issues.apache.org/jira/browse/MODPYTHON-93 > Project: mod_python > Type: Improvement > Components: core > Versions: 3.3 > Reporter: Jim Gallacher > Assignee: Jim Gallacher > Priority: Minor > Attachments: modpython325_util_py_dict.patch > > Form fields are saved as a list in a FieldStorage class instance. The class > implements a __getitem__ method to provide dict-like behaviour. This method > iterates over the complete list for every call to __getitem__. Applications > that need to access all the fields when processing the form will show O(n^2) > behaviour where n == the number of form fields. This overhead could be > avoided by creating a dict (to use as an index) when the FieldStorage > instance is created. > Mike Looijmans has been investigating StringField and Field as well. It is > probably reasonable to include information on his work in this issue as well, > so that we can consider all of these efficiency issues in toto. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency
[ http://issues.apache.org/jira/browse/MODPYTHON-93?page=comments#action_12444850 ] Mike Looijmans commented on MODPYTHON-93: - A very simple approach would be to delete the dictionary attribute when any change is made. The add_field method should be made private, and one could add a __setitem__ method like this: def __setitem__(self, key, value): if self.__dict__.has_key('dictionary'): del self.dictionary self._add_field(key, value) A similar approach can be taken to add the __delitem__ functionality. Mike Looijmans Philips Natlab / Topic Automation > Improve util.FieldStorage efficiency > > > Key: MODPYTHON-93 > URL: http://issues.apache.org/jira/browse/MODPYTHON-93 > Project: mod_python > Issue Type: Improvement > Components: core >Affects Versions: 3.2.7 >Reporter: Jim Gallacher > Assigned To: Graham Dumpleton >Priority: Minor > Fix For: 3.3 > > Attachments: modpython325_util_py_dict.patch > > > Form fields are saved as a list in a FieldStorage class instance. The class > implements a __getitem__ method to provide dict-like behaviour. This method > iterates over the complete list for every call to __getitem__. Applications > that need to access all the fields when processing the form will show O(n^2) > behaviour where n == the number of form fields. This overhead could be > avoided by creating a dict (to use as an index) when the FieldStorage > instance is created. > Mike Looijmans has been investigating StringField and Field as well. It is > probably reasonable to include information on his work in this issue as well, > so that we can consider all of these efficiency issues in toto. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MODPYTHON-222) Support for chunked transfer encoding on request content.
[ https://issues.apache.org/jira/browse/MODPYTHON-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509749 ] Mike Looijmans commented on MODPYTHON-222: -- For my tape project I used "None" as valid len argument to indicate reading the next available tape block, regardless of size (underlying layer figured out the tape block size). This worked well together with existing file type APIs. So that would make the API: req.read(2048) - reads up to 2048 bytes. If it reads less, it's at the end of the stream. Blocks until the requested amount has been read. req.read() - reads all data from the stream (bad idea), blocks until EOS. req.read(None) - reads the next chunk of data, blocks if no data available. Returns 0 size if at EOS. Mike Looijmans Philips Natlab / Topic Automation > Support for chunked transfer encoding on request content. > - > > Key: MODPYTHON-222 > URL: https://issues.apache.org/jira/browse/MODPYTHON-222 > Project: mod_python > Issue Type: New Feature > Components: core >Affects Versions: 3.3.1 >Reporter: Graham Dumpleton > > It is currently not possible to use chunked transfer encoding on request > content delivered to a mod_python request handler. > The use of chunked transfer encoding is explicitly blocked in C code by: > rc = ap_setup_client_block(self->request_rec, REQUEST_CHUNKED_ERROR); > To allow chunked transfer encoding instead of REQUEST_CHUNKED_ERROR it would > be necessary to supply REQUEST_CHUNKED_DECHUNK. > Problem is that it isn't that simple. > First off, the problems associated with MODPYTHON-212 have to be fixed with > code being able to cope with there being no content length. > The next issue is that req.read() method is currently documented as behaving > as: > If the len argument is negative or omitted, reads all data given by the > client. > This means that can't have req.read() with no arguments mean give me > everything that is currently available in input buffers as everyone currently > expects it to return everything sent by client. Thus, to be able to process > streaming data one would have to supply an amount of data that one wants to > read. The code for that though will always try to ensure that that exact > amount of data is read and will block if not enough and not end of input. A > handler though may not want it to block and be happy with just getting what > is read and only expect it to block if nothing currently available. > In other words, the current specification for how req.read() behaves is > incompatible with what would be required to support chunked transfer encoding > on request content. > Not sure how this conflict can be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MODPYTHON-238) req.connection.keepalive should be writable
[ https://issues.apache.org/jira/browse/MODPYTHON-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511031 ] Mike Looijmans commented on MODPYTHON-238: -- If there is a separate req.chunked, I'd advise to keep them separated. It's perfectly valid to have a chunked stream with a connection: close setting, and someone is bound to use that. For one thing, it allows the client to reliably detect whether all the data has really been sent. Having a side-effect like setting one disables the other is bound to surprise someone. Mike Looijmans Philips Natlab / Topic Automation > req.connection.keepalive should be writable > --- > > Key: MODPYTHON-238 > URL: https://issues.apache.org/jira/browse/MODPYTHON-238 > Project: mod_python > Issue Type: Improvement > Components: core >Affects Versions: 3.3.1 >Reporter: Graham Dumpleton >Priority: Minor > > The attribute req.connection.keepalive should be writable. This would allow > handlers to use: > req.connection.keepalive = apache.AP_CONN_CLOSE > Doing this before any data is written has the effect of disabling keepalive > and also turning off chunked encoding for response content when no content > length has been supplied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.