Currently;

        GET / HTTP/1.1
        Host: ftp.heanet.ie

        GET http://ftp.heanet.ie/ HTTP/1.0

        GET HTTP://Ftp.Heanet.Ie/ HTTP/1.0

are all mapped to different hashes by mod_cache; despite being the same
content, this is an inefficient waste of disk space and really awkward
for me trying to write a debug/admin tool.

The attached patch makes it deterministic, by mapping them all to;

        "http://ftp.heanet.ie:80/?"; 

Instead of "ftp.heanet.ie/?". For for a cached webserver, this really
won't make much of a difference since the Host-header is forcably
lower-cased anyway, but for a proxy it definitely helps.  Looking
through my logs I'm seeing lots of simple domain case variations - no
point storing them twice and handling all of the expires multiple times.

It also solves the colision that happens if aan administrator wants to
run Apache listening on multiple ports, but has mod_cache enabled. 

The only awkwardness I can see with this approach, is that;

        GET / HTTP/1.0

would look like this;

        "http://:80/?";

So, I've re-used the _default_ "convention" (underscores are not
permitted in DNS anyway) for such keys;

        "http://_default_:80/?";

Which should at least make a familiar sort of sence to an administrator.

-- 
Colm MacCárthaigh                        Public Key: [EMAIL PROTECTED]
Index: modules/cache/cache_storage.c
===================================================================
--- modules/cache/cache_storage.c       (revision 232304)
+++ modules/cache/cache_storage.c       (working copy)
@@ -318,12 +318,46 @@
 apr_status_t cache_generate_key_default(request_rec *r, apr_pool_t* p,
                                         char**key)
 {
-    if (r->hostname) {
-        *key = apr_pstrcat(p, r->hostname, r->uri, "?", r->args, NULL);
+    const char *hostname;
+    char *port_str, *scheme;
+    int i;
+
+    /* Use _default_ as the hostname if none present, as in mod_vhost
+     * Note: r->hostname is always lowercase
+     */
+    hostname = r->hostname ? r->hostname : "_default_";
+  
+    /* Copy the scheme, ensuring that it is lower case. If the parsed uri
+     * contains no string, we use "http" as the default. This is a fair
+     * assumption, as request_rec is HTTP-specific.
+     */
+    if (r->parsed_uri.scheme) {
+        scheme = apr_pcalloc(p, strlen(r->parsed_uri.scheme) + 1);
+        for (i = 0; r->parsed_uri.scheme[i]; i++) {
+            scheme[i] = apr_tolower(r->parsed_uri.scheme[i]);
+        }
     }
     else {
-        *key = apr_pstrcat(p, r->uri, "?", r->args, NULL);
+        scheme = "http";
     }
+
+    /* Copy the port string, ensuring that it is lower case (it may be a
+     * service name. If not present, use the connection to determine port
+     * number
+     */
+    if (r->parsed_uri.port_str) {
+        port_str = apr_pcalloc(p, strlen(r->parsed_uri.port_str) + 1);
+        for (i = 0; r->parsed_uri.port_str[i]; i++) {
+            port_str[i] = apr_tolower(r->parsed_uri.port_str[i]);
+        }
+    }
+    else {
+        port_str = apr_psprintf(p, "%u", ap_get_server_port(r));
+    }
+
+    /* Key format is a URI */
+    *key = apr_pstrcat(p, scheme, "://", hostname, ":", port_str,
+                       r->parsed_uri.path, "?", r->args, NULL);
+
     return APR_SUCCESS;
 }
-

Reply via email to