New submission from Glenn Linderman <v+pyt...@g.nevcal.com>:

In attempting to review issue 4953, I discovered a conundrum in handling of 
multipart/formdata.

cgi.py has claimed for some time (at least since 2.4) that it "handles" file 
storage for uploading large files.  I looked at the code in 2.6 that handles 
such, and it uses the rfc822.Message method, which parses headers from any 
object supporting readline().  In particular, it doesn't attempt to read 
message bodies, and there is code in cgi.py to perform that.

There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has 
gone away, and been replaced with the email package.  Theoretically this is 
good, but the cgi FieldStorage read_multi method now parses the whole CGI input 
and then iteration parcels out items to FieldStorage instances.  There is a 
significant difference here: email reads everything into memory (if I 
understand it correctly).  That will never work to upload large or many files 
when combined with a Web server that launches CGI programs with memory limits.

I see several possible actions that could be taken:
1) Documentation.  While it is doubtful that any is using 3.x CGI, and this 
makes it more doubtful, the present code does not match the documentation, 
because while the documenteation claims to handle file uploads as files, rather 
than in-memory blobs, the current code does not do that.

2) If there is a method in the email package that corresponds to 
rfc822.Message, parsing only headers, I couldn't find it.  Perhaps it is 
possible to feed just headers to BytesFeedParser, and stop, and get the same 
sort of effect.  However, this is not the way the cgi.py presently is coded.  
And if there is a better API, for parsing only headers, that is or could be 
exposed by email, that might be handy.

3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one 
level.  I'm not sure if any present or planned web browsers use nested 
multipart/ stuff... I guess it would require a nested <form> tag? which is 
illegal HTML last I checked.  So perhaps the general logic flow of 2.6 cgi.py 
could be reinstated, with a technique to feed only headers to BytesFeedParser, 
together with reinstating the MIME body parsing in cgi.py,b and this could make 
a solution that works.

I discovered this, beacuase I couldn't figure out where a bunch of the methods 
in cgi.py were called from, particularly read_lines_to_outerboundary, and 
make_file.  They seemed to be called much too late in the process.  It wasn't 
until I looked back at 2.6 code that I could see that there was a transition 
from using rfc822 only for headers to using email for parsing the whole data 
stream, and that that was the cause of the documentation not seeming to match 
the code logic.  I have no idea if this problem is in 2.7, as I don't have it 
installed here for easy reference, and I'm personally much more interested in 
3.2.

----------
components: Library (Lib)
messages: 125884
nosy: r.david.murray, v+python
priority: normal
severity: normal
status: open
title: cgi memory usage
versions: Python 3.1, Python 3.2, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10879>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to