2005/11/29, Gregory (Grisha) Trubetskoy <[EMAIL PROTECTED]>:

On Tue, 29 Nov 2005, Nicolas Lehuen wrote:

> def current_url(req):

[snip]

>
>    # host
>    current_url.append(req.hostname)

[snip]

This part isn't going to work reliably if you are not using virtual hosts
and just bind to an IP number. Deciphering the URL is an impossible task -
I used to have similar code in my apllications, but lately I realized that
it does not work reliably and it is much simpler to just treat it as a
configuration item...

That's awful. How come such a basic thing is so difficult ? I mean, isn't it weird that server-side code has less information about its URL than the client ? Note that it's not a mod_python specific problem, I've seen it also in the Servlet API.

If I understand you correctly, req.hostname is not reliable in case where virtual hosting is not used. What about server.server_hostname, which seems to be used by the code from mod_rewrite you posted below ? Can it be used reliably ?

> First question, is there a simpler way to do this ? Ironically, when using
> mod_rewrite, you get an environment variable named SCRIPT_URI which is
> precisely what I need (SCRIPT_URL, also added by mod_rewrite, is equivalent
> to req.uri... Don't ask we why). But relying on it isn't safe since
> mod_rewrite isn't always used.

well - here's how it does it.

     /*
      *  create the SCRIPT_URI variable for the env
      */

     /* add the canonical URI of this URL */
     thisserver = ap_get_server_name(r);
     port = ap_get_server_port(r);
     if (ap_is_default_port(port, r)) {
         thisport = "";
     }
     else {
         apr_snprintf(buf, sizeof(buf), ":%u", port);
         thisport = buf;
     }
     thisurl = apr_table_get(r->subprocess_env, ENVVAR_SCRIPT_URL);

     /* set the variable */
     var = apr_pstrcat(r->pool, ap_http_method(r), "://", thisserver, thisport,
                      thisurl, NULL);
     apr_table_setn(r->subprocess_env, ENVVAR_SCRIPT_URI, var);

     /* if filename was not initially set,
      * we start with the requested URI
      */
     if (r->filename == NULL) {
         r->filename = apr_pstrdup(r->pool, r->uri);
         rewritelog(r, 2, "init rewrite engine with requested uri %s",
                    r->filename);
     }

Shall we add this code to the native part of the request object, then ? Or the server object (without the URL part), maybe ? But is it really reliable (see question above) ?

> Second question, if there isn't any simpler way to do this, should we add it
> to mod_python ? Either as a function like above in mod_python.util, or as a
> member of the request object (named something like url to match the other
> member named uri, but that's just teasing).

I don't know... Since the result is going to be half-baked... I think a
more interesting and mod_python-ish thing to do would be to expose all the
API's used in the above code (e.g. ap_get_server_name, ap_is_default_port,
ap_http_method) FIRST, then think about this.

> And third question (in pure Spanish inquisition style) : why is
> req.parsed_uri returning me a tuple full of Nones except for the uri and
> path_info part ?

This is an httpd question most likely...

So it's a feature / bug in httpd. Maybe it's due to my use of VirtualDocumentRoot.

> Ah, fourth question : why are we (mod_python, mod_rewrite and the CGI
> environment variables) using the terms "URI" and "URL" to distinguish
> between a full, absolute resource path and a path relative to the server,
> whereas the definition of URLs and URIs is very vague and nothing close to
> this ( http://www.w3.org/TR/uri-clarification/#contemporary) ? Shouldn't we
> save our souls and a lot of saliva by choosing better names ?

No, we (mod_python) should just use the exact same name that httpd uses.
If we come up better names, then it's just going to make it even more
confusing.

Fair enough. The problem is that even httpd and mod_rewrite don't agree on what an URL and an URI are...

> OK, OK, fifth question : we made req.filename and other members writable.
> But when those attributes are changed, as Graham noted a while ago, the
> other dependent ones aren't, leading to inconsitencies (for example, if you
> change req.filename, req.canonical_filename isn't changed). Should we try to
> solve this

The solutions is to make req.canonical_filename writable too and document
that if you change req.filename, you may consider changing
canonical_filename as well and what will happen if you do not.

> and provide clear definition of the various parts of a request
> for mod_python 3.3 ?

Yes, that'd be good :)

Grisha

Reply via email to