Glenn Linderman <v+pyt...@g.nevcal.com> added the comment:

R. David said:
>From looking over the cgi code it is not clear to me whether Pierre's approach 
>is simpler or more complex than the alternative approach of starting with 
>binary input and decoding as appropriate.  From a consistency perspective I 
>would prefer the latter, but I don't know if I'll have time to try it out 
>before rc1.

I say:
I agree with R. David that an approach using the binary input seems more 
appropriate, as the HTTP byte stream is defined as binary.  Do the 3.2 beta 
email docs now include documentation for the binary input interfaces required 
to code that solution?  Or could you provide appropriate guidance and review, 
should someone endeavor to attempt such a solution?

The remaining concerns below are only concerns; they may be totally irrelevant, 
and I'm too ignorant of how the code works to realize their irrelevance.  
Hopefully someone that understands the code can comment and explain.

I believe that the proper solution is to make cgi work if sys.stdin has already 
been converted to be a binary stream, or if it hasn't, to dive down to the 
underlying binary stream, using detach().  Then the data should be processed as 
binary, and decoded once, when the proper decoding parameters are known.  The 
default encoding seems to be different on different platforms, but the binary 
stream is standardized.  It looks like new code was added to attempt to 
preprocess the MIME data into chunks to be fed to the email parser, and while I 
can believe code could be written to do such correctly (but I can't speak for 
whether this patch code is correct or not), it seems redundant/inefficient and 
error-prone to do it once outside the email parser, and again inside it.

I also doubt that self.fp.encoding is consistent from platform to platform).  
But the HTTP bytestream is binary, and self-describing or declared by HTTP or 
HTML standards for the parts that are not self-describing.  The default 
platform encoding used for the preopened sys.stdin is not particularly relevant 
and may introduce mojibake type bugs, decoding errors in the presence of some 
inputs, and/or platform inconsistencies, and it seems that that is generally 
where self.fp.encoding, used in various places in this patch, comes from.

Regarding the binary vs. text issue; when using both binary and text interfaces 
on output streams, there is the need to do flushing between text and binary 
writes to preserve the proper sequencing of data in the output.  For input, is 
it possible that mixing text and binary input could result in the binary input 
missing data that has already been preloaded into the text buffer?  Although, 
for CGI programs, no one should have done any text inputs before calling the 
CGI functions, so perhaps this is also not a concern... and there probably 
isn't any buffering on socket streams (the usual CGI use case) but I see the 
use of both binary and text input functions in this patch, so this may be 
another issue that someone could explain why such a mix is or isn't a problem.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4953>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to