Jean-Marc Le Peuvedic <lepeuve...@gmail.com> added the comment:

The exception is raised in the start_response function provided by web.py's 
WSGIGateway class in wsgiserver3.py:1997.

# According to PEP 3333, when using Python 3, the response status
# and headers must be bytes masquerading as unicode; that is, they
# must be of type "str" but are restricted to code points in the
# "latin-1" set.

Therefore, header values must be strings whenever start_response is called. 
WSGI servers must accumulate headers in some data structure and must call the 
supplied "start_response" function, when they have gathered all the headers and 
converted all the values to strings.

The fault I observed is not strictly speaking caused by a bug in Python lib 
"server.py". Rather, it is a component interaction failure caused by 
inadequately defined semantics. The interaction between web.py and server.py is 
quite complex, and no component is faulty when considered alone.

I explain:

Response and headers management in server.py is handled by 3 methods of class 
BaseHTTPRequestHandler:
- send_response : puts response in buffer
- send_header : converts to string and adds to buffer
    ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))
- end_headers : flushes buffer to socket

This implementation is correct even if send_header is called with an
int value.

Now, web.py's application.py defines a "wsgi(env, start_resp)" function, which 
gets plugged into the CherryPy WSGI HTTP server.

The server is an instance of class wsgiserver.CherryPyWSGIServer created in 
httpserver.py:169 (digging deeper, actually at line 195).
This server is implemented as a HTTPServer configured to use gateways of type 
class WSGIGateway_10 to handle requests.

A gateway is basically an instance of class initialized with a HTTPRequest 
instance, that has a "respond" method. Of course the WSGIGateway implements 
"respond" as described in the WSGI standard: it calls the WSGI-compliant web 
app, which is a function(environ, start_response(status, headers)) returning an 
iterator (for chunked HTTP responses). The start_response function provided by 
class WSGIGateway is where the failure occurs.

When the application calls web.py's app.run(), the function runwsgi in web.py's 
wsgi.py get called. This function determines if it gets request via CGI or 
directly. In my case it starts a HTTP server using web.py's runsimple function 
(file httpserver.py:158).

This function never returns, and runs the CherryPyWSGIServer, but it first 
wraps the wsgi function in two WGSI Middleware callables. Both are defined in 
web.py's httpserver.py file. The interesting one is StaticMiddleWare (line 
281). Its role, is to hijack URLs starting with /static, as is the case with my 
missing CSS file. In order to serve those static resources quickly, its 
implementation uses StaticApp (a WSGI function serving static stuff, defined 
line 225), which extends Python's SimpleHTTPRequestHandler. That's where to two 
libraries connect.

StaticApp changes the way headers are processed using overloaded methods for 
send_response, send_header and end_headers. This means that, when StaticApp 
calls SimpleHTTPRequestHandler.send_head() to send the HEAD part of the 
response, the headers are managed using the overloaded methods. When 
send_head() finds out that my CSS file does not exist and calls send_error() a 
Content-Length header gets written, but it is not converted to string, because 
the overloaded implementation just stores the header name and value in a list 
as they come.

When it has finished gathering headers using Python's send_head(), it 
immediately calls start_response provided by WSGIGateway, where the failure 
occurs.

The bug in Python is not strictly that send_header gets called with an int in 
send_error. Rather, it is a documentation bug which fails to mention that 
send_header/end_headers MUST CONVERT TO STRING and ENCODE IN LATIN-1.

Therefore the correction I proposed is still invalid, because the combination 
of web.py and server.py after the correction, still does not properly encode 
the headers.

As a conclusion I would say that:
- In Python lib, the bug is a documentation bug, where documentation fails to 
indicate that send_headers and/or end_headers can receive header names or 
values which are not strings and not encoded in strict latin-1, and that it is 
their responsibility to do so.
- In Web.py because the implementation of the overloaded methods fails to 
properly encode the headers.

Of course, changing int to str does no harm and makes everything more 
resilient, but does not fix the underlying bug.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33663>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to