Duncan Booth wrote: > "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > >>>Is "urllib" wrong?
> Section 5.2 is also relevant here. In particular: > > >> g) If the resulting buffer string still begins with one or more >> complete path segments of "..", then the reference is >> considered to be in error. Implementations may handle this >> error by retaining these components in the resolved path (i.e., >> treating them as part of the final URI), by removing them from >> the resolved path (i.e., discarding relative levels above the >> root), or by avoiding traversal of the reference. > > > The common practice seems to be for client-side implementations to handle > this using option 2 (removing them) and servers to use option 3 (avoiding > traversal of the reference). urllib uses option 1 which is also correct but > not as useful as it might be. That's helpful. Thanks. In Python, of course, "urlparse.urlparse", which is the main function used to disassemble a URL, has no idea whether it's being used by a client or a server, so it, reasonably enough, takes option 1. (Yet another hassle in processing real-world HTML.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list