whatever

Nicolas Lehuen Wed, 30 Nov 2005 13:42:23 -0800

2005/11/30, Jim Gallacher <[EMAIL PROTECTED]>:
[snip]

Nicolas Lehuen wrote:
> Ah, while I'm at it, knowing the DocumentRoot of the current VirtualHost
> would be great, too. But that's another story.

I don't know that story. Is there a problem with req.document_root()?

Well, I think I'm doing a bad thing, and I have to stop doing it. I'm using mod_vhost_alias, which is a way to implement mass virtual hosting. It's kind of neat, since you get one document root per virtual host, all document roots are subdirs of a common parent directory, without the hassle of using mod_rewrite. However, it seems a bit unfinished on the edges since req.document_root() returns the common parent directory instead of the true, per-virtual host document root.

Also, I don't know if using mod_rewrite to implement mass virtual hosting can change the document root accordingly. So the only way to know my document root is to compute it from the common parent directory and the virtual host name, and bam, we're back on our track of "how do I get the current virtual host name ?".

> 4) URL or URI or whatever you choose to name the part of the resource
> one the physical matters of protocol, server and port are sorted out
>
> uri = req.uri
>
> Note that this uri can in turn be splitted in something which is lost by
> the publisher and the req.path_info field, that is IIRC that we can
> assert(req.uri.endswith (req.path_info)). I don't know what req.path_info
> is before the publisher kicks in, though.

I'm not sure I understand what is being lost since publisher does not
modify req.uri. Something that I've found useful but which seems to be
missing is the idea of a base_uri, where

uri = base_uri + path_info

Or maybe the base_uri part is what you mean when you say something is lost?

Using the enclosed file, which is both a test handler and a page that can be published, I got those results :

1) Using test.py as a handler

URI
---
req.unparsed_uri: '/test.handler/subpath#toto'
req.parsed_uri: (None, None, None, None, None, None, '/test.handler/subpath', None, 'toto')
req.uri: '/test.handler/subpath'
req.path_info: '/subpath'
req.subprocess_env.get("SCRIPT_NAME"): '/test.handler'
req.subprocess_env.get("PATH_INFO"): '/subpath'
req.subprocess_env.get("SCRIPT_URL"): None
req.subprocess_env.get("SCRIPT_URI"): None

2) Using the publisher handler to publish test.py

URI
---
req.unparsed_uri: '/test.py/subpath#toto'
req.parsed_uri: (None, None, None, None, None, None, '/test.py/subpath', None, 'toto')
req.uri: '/test.py/subpath'
req.path_info: '/subpath'
req.subprocess_env.get("SCRIPT_NAME"): '/test.py'
req.subprocess_env.get("PATH_INFO"): '/subpath'
req.subprocess_env.get("SCRIPT_URL"): None
req.subprocess_env.get("SCRIPT_URI"): None

I must confess I'm completely at a loss here...

a) Handlers and published modules seem to behave the same way, so the computation of path_info must come from above, i.e. either from mod_python or from Apache.

b) We've got req.uri == req.subprocess_env.get("SCRIPT_NAME") + req.subprocess_env.get("PATH_INFO"). Cool, but who does the split ? I'm guessing that it's Apache who does it thanks to the AddHandler directives ; it knows that the .py extension must be served by mod_python, hence it deduces that /test.py must be the script name and /subpath some path info to provide the script with.

c) We don't have a req.base_uri (to follow Jim's naming suggestion) or req.script_name that would be equivalent to req.subprocess_env.get("SCRIPT_NAME"), but we have a req.path_info... Why is this missing ?

I'm beginning to think that all this feels highly un-pythonic. There are a lot more than one way to get some data (the host name is a good example). You get to use req.foobar or req.subprocess_env['FOOBAR'] or req.server.foobar (and feel happy if there is only one FOOBAR which gives you the data you need). subprocess_env is a very ugly name which doesn't seem to be related to mod_python at all (I'm using the multi-threaded MPM and I don't have subprocesses). For some data, there is no way to get it (where is the current virtual host name, as determined by Apache ?).

One thing I'll try to do is to write a kind of Rosetta Stone with all the data you can find in a URL, how to get it from the request/connection/server object, how to get it from subprocess_env (i.e. how you would get it in a CGI), and what is missing or duplicated.

This way we'll be able to decide if we should deprecate some paths to those data and remove them in a later release (3.3 or 3.4). The end result would be a series of statements like "If you want to get the virtual host name, then use XYZ, don't use ABC which is deprecated. Be aware though that it's not 100% efficient, blah blah".

Regards,
Nicolas

> Anyway, the length of this thread shows that a bit of clarification is
> required. A page named something like "What's in an URL ?" and
> explaining the client-side and server-side view of the various
> components of a URL are would be great. I'll try to write a draft this
> week-end.

Excellent.

The corollary of this discussion is putting the parsed_uri back together
again. Is there any support for exposing apr_uri_unparse()?

Jim

test.py
Description: application/python

Re: Various musings about the request URL / URI / whatever

Reply via email to