[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-09-25 Thread Robert Collins

Robert Collins added the comment:

FWIW we probably need to capture the original unaltered URL somewhere, but also 
ensure that PATH_INFO is always a relative path.

One should be able to implement a proxy in WSGI (because thats just another 
specialised app), and doing that today requires special handling depending on 
the WSGI container, which isn't great for consistency.

On security; Host header - url host mismatches occur when the host to which a 
request is sent != the url; this is expected only in the case of forward 
proxies: any other time it would indeed be a smuggling attack, trying to find 
mismatches between acls and access in servers - this is another reason to 
consolidate things so that wsgi apps can rely on urls looking consistent.

--
nosy: +rbcollins

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-09-25 Thread Robert Collins

Robert Collins added the comment:

Oh, also - while its tempting to say that it doesn't matter whether we take the 
urls host portion, or the host header or the server name - it does. Deployments 
that look like:

LB/Firewall - backend container - WSGI app

are likely to have assumptions within the WSGI app about what sort of urls they 
expect to reach them. We can mitigate this by being very explicit about 
whatever solution we have.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-09-16 Thread mouad

mouad added the comment:

Hi Guido,

If I understand this correctly, the HOST header was added only in HTTP1.1 and 
setting the absolute URI was the right behavior client should follow if they 
are behind a proxy for HTTP1.0, but the same behavior was kept in HTTP1.1 for 
backward compatibility.

 In any case I would worry (a bit) that this might cause security issues if
 implemented as naively as shown in your patch,
 the other components of the URL should probably be validated against the
 configuration of the server.
 Also I am wondering whether specifying a different port or protocol (e.g.
 HTTPS) should be allowed or not.

If there should be a validation, I think it should be done in 
BaseHTTPRequestHandler, FWIW this later doesn't validate HOST header neither, 
just tested sending a request to python -mhttp.server which succeeded.

$ telnet 127.0.0.1 8000
GET /dummy HTTP/1.1
HOST google.com

One thing to note here for future work for the validation part, is that 
according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.2, 
first point:

If Request-URI is an absoluteURI, the host is part of the Request-URI.
Any Host header field value in the request MUST be ignored.

 and what to do with extraneous parts of the path (in particular the
 #fragment identifier)

AFAIK clients are not supposed to send fragments to servers, and I didn't find 
in the HTTP spec what should happen if they do, CherryPy for example (link of 
the code is in the footer) will raise 400 if request URI include the #fragment 
part.

An other thing that CherryPy guys also did, is that ``HTTPRequest(...).path`` 
will always return a relativ path, which is IMHO the right behavior but for 
backward compatibility I hesitate fixing this problem directly in 
BaseHTTPRequestHandler or should we ? 

 You should probably also be careful with path-less domain names -- IIRC some
 URL parsers produce  as the path for e.g. http://python.org;.

According to the PEP-0333 the PATH_INFO can be empty: 

PATH_INFO
 The remainder of the request URL's path, designating the virtual   
 location of the request's target within the application. This may be an
 empty string, if the request URL targets the application root and does 
 
 not have a trailing slash.

 Have you asked for the status of this particular feature?

I sent an email to python WEB-SIG mailing list trying to gather information 
about this behavior, but not luck yet :(

At this point I would like to link to both CherryPy and Werkzeug that already 
handle this behavior.

Werkzeug: 
https://github.com/mitsuhiko/werkzeug/blob/0.9.6/werkzeug/serving.py#L75-81
CherryPy: 
https://github.com/cherrypy/cherrypy/blob/cherrypy-3.2.2rc1/cherrypy/wsgiserver/wsgiserver3.py#L633-638.

As a side note, I have already said in my email to the WEB-sig kudos to 
cherrypy guys, the funny part is that I am starting to think that they 
succeeded in both my tests because they didn't rely on core python 
implementation of BaseHTTPRequestHandler (I guess reinventing the wheel 
sometime works :)), at the opposite to other frameworks that showed this 
problems :).

Cheers,

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-09-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Wow. This is interesting. I thought that absolute URL support was only for 
proxies, but the spec you quote says clearly it should be supported as a 
transition towards always specifying the full URL. I guess they want to get rid 
of the Host: header?

In any case I would worry (a bit) that this might cause security issues if 
implemented as naively as shown in your patch -- the other components of the 
URL should probably be validated against the configuration of the server. Also 
I am wondering whether specifying a different port or protocol (e.g. HTTPS) 
should be allowed or not, and what to do with extraneous parts of the path (in 
particular the #fragment identifier). You should probably also be careful 
with path-less domain names -- IIRC some URL parsers produce  as the path for 
e.g. http://python.org;.

Finally, AFAIK the HTTP 1.1 standard is full of ideas that few server 
implementations support, for various reasons, and it's possible that a future 
standard actually rescinds (or discourages) some features. Have you asked for 
the status of this particular feature?

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-05-11 Thread mouad

New submission from mouad:

Hi,

As most of you know, working behind a HTTP proxy raise all this corner cases 
that no one think about until he has to, one of them i had to deal with some 
months ago is absolute request URI in HTTP request, that some client will send 
when they detect that they are behind an HTTP proxy, to be more precise i am 
talking about this section of rfc2616 
(https://gist.github.com/mouadino/7930974), i.e.

  GET http://domain.com/path HTTP/1.1

As my surprise i found that most WSGI server that are out there doesn't support 
very well this feature including wsgiref, to test that i have created this 
simple python script https://gist.github.com/mouadino/7930974 that include also 
status of this bug in different other projects.

In term of bug impact, if a WSGI server doesn't support this, the WSGI 
environment variable PATH_INFO will end up containing an absolute URI instead 
of relative as the standard require 
(http://legacy.python.org/dev/peps/pep-0333/), which mean that most uri routing 
that web framework do end up failing (Returning 404) b/c most of them will try 
to much PATH_INFO with there different routes (which are relative paths) to 
find which controller action should they executed e.g. 
https://pypi.python.org/pypi/Routes .

FWIW i found this bug while trying to debug boto library (that use absolute uri 
if proxy is detected 
https://github.com/mouadino/boto/blob/v2.13.2/boto/connection.py#L955) that was 
communicating with an OpenStack API that use eventlet wsgi server  (That 
doesn't handle absolute URI 
https://github.com/eventlet/eventlet/blob/v0.14/eventlet/wsgi.py#L471) and 
routes (that match PATH_INFO with action path 
https://github.com/bbangert/routes/blob/master/routes/mapper.py#L678).

--
components: Library (Lib)
messages: 218258
nosy: mouad, pje
priority: normal
severity: normal
status: open
title: Fix wsgiref handling of absolute HTTP Request-URI
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-05-11 Thread mouad

Changes by mouad mouad...@gmail.com:


--
keywords: +patch
Added file: http://bugs.python.org/file35214/issue21472.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21472] Fix wsgiref handling of absolute HTTP Request-URI

2014-05-11 Thread Berker Peksag

Changes by Berker Peksag berker.pek...@gmail.com:


--
stage:  - patch review
versions:  -Python 3.1, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21472
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com