Glenn Linderman <v+pyt...@g.nevcal.com> added the comment:

This looks much simpler than the previous patch.  However, I think it can be 
further simplified. This is my first reading of this code, however, so I might 
be totally missing something(s).

Pierre said:
Besides FieldStorage, I modified the  parse() function at module level, but not 
parse_multipart (should it be kept at all ?)

I say:
Since none of this stuff works correctly in 3.x, and since there are comments 
in the code about "folding" the parse* functions into FieldStorage, then I 
think they could be deprecated, and not fixed.  If people are still using them, 
by writing code to work around their deficiencies, that code would continue to 
work for 3.2, but then not in 3.3 when that code is removed?  That seems 
reasonable to me.  In this scenario, the few parse* functions that are used by 
FieldStorage should be copied into FieldStorage as methods (possibly private 
methods), and fixed there, instead of being fixed in place.  That was all the 
parse* functions could be deprecated, and the use of them would be unchanged 
for 3.2.

Since RFC 2616 says that the HTTP protocol uses ISO-8859-1 (latin-1), I think 
that should be required here, instead of deferring to fp.encoding, which would 
eliminate 3 lines.

Also, the use of FeedParser could be replaced by BytesFeedParser, thus 
eliminating the need to decode header lines in that loop.

And, since this patch will be applied only to Python 3.2+, the mscvrt code can 
be removed (you might want a personal copy with it for earlier version of 
Python 3.x, of course).

I wonder if the 'ascii' reference should also be 'latin-1'?

In truly reading and trying to understand this code to do a review, I notice a 
deficiency in _parseparam and parse_header: should I file new issues for them? 
(perhaps these are unimportant in practice; I haven't seen \ escapes used in 
HTTP headers).  RFC 2616 allows for "" which are handled in _parseparam.  And 
for \c inside "", which is handled in parse_header.  But: _parseparam counts " 
without concern for \", and parse_header allows for \\ and \" but not \f or \j 
or \ followed by other characters, even though they are permitted (but probably 
not needed for much).

In make_file, shouldn't the encoding and newline parameters be preserved when 
opening text files?  On the other hand, it seems like perhaps we should 
leverage the power of IO to do our encoding/decoding... open the file with the 
TextIOBase layer set to the encoding for the MIME part, but then just read 
binary without decoding it, write it to the .buffer of the TextIOBase, and when 
the end is reached, flush it, and seek(0).  Then the data can be read back from 
the TextIOBase layer, and it will be appropriate decoded.  Decoding errors 
might be deferred, but will still occur.  This technique would save two data 
operations: the explicit decode in the cgi code, and the implicit encode in the 
IO layers, so resources would be saved.  Additionally, if there is a 
CONTENT-LENGTH specified for non-binary data, the read_binary method should be 
used for it also, because it is much more efficient than readlines... less 
scanning of the data, and fewer outer iterations.  This goes well with 
 the technique of leaving that data in binary until read from the file.

It seems that in addition to fixing this bug, you are also trying to limit the 
bytes read by FieldStorage to some maximum (CONTENT_LENGTH).  This is good, I 
guess.  But skip_lines() has a readline potentially as long as 32KB, that isn't 
limited by the maximum.  Similar in read_lines_to_outer_boundary, and 
read_lines_to_eof (although that may not get called in the cases that need to 
be limited).  If a limit is to be checked for, I think it should be a true, 
exact limit, not an approximate limit.

See also issue 10879.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4953>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to