Re: urllib interpretation of URL with ".."

John Nagle Mon, 25 Jun 2007 09:41:12 -0700

Duncan Booth wrote:
> "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> 
>>>Is "urllib" wrong?


> Section 5.2 is also relevant here. In particular:
> 
> 
>>      g) If the resulting buffer string still begins with one or more
>>         complete path segments of "..", then the reference is
>>         considered to be in error.  Implementations may handle this
>>         error by retaining these components in the resolved path (i.e.,
>>         treating them as part of the final URI), by removing them from
>>         the resolved path (i.e., discarding relative levels above the
>>         root), or by avoiding traversal of the reference.
> 
> 
> The common practice seems to be for client-side implementations to handle 
> this using option 2 (removing them) and servers to use option 3 (avoiding 
> traversal of the reference). urllib uses option 1 which is also correct but 
> not as useful as it might be.

    That's helpful.  Thanks.

    In Python, of course, "urlparse.urlparse", which is
the main function used to disassemble a URL, has no idea whether it's being
used by a client or a server, so it, reasonably enough, takes option 1.

    (Yet another hassle in processing real-world HTML.)

                                        John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib interpretation of URL with ".."

Reply via email to