Re: [PATCH] mod_negotiation, suffix order

William A. Rowe, Jr. Wed, 03 Oct 2001 07:10:47 -0700


Bringing us back from random stream of conciousness... here's the thread
to date (goes back to April, so I suppose a full repost is in order.)


My fresh commentary is inline with Ken's comments below.

----- Original Message ----- 
From: "Francis Daly" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, April 30, 2001 1:09 PM
Subject: [PATCH] mod_negotiation, suffix order


> Hi there,
> 
> this is essentially a repost of some mails earlier this month with the
> same patch and a similar Subject:.
> 
> Appended to this mail is a patch to remove the requirements on the
> order of suffixes when using MultiViews / mod_negotiation.  It does
> have the down side of increasing the number of valid URLs for the same
> content, but to a limited extent that is implicit in mod_negotiation
> anyway.
> 
> The patch is relative to the version of mod_negotiation.c distributed
> with apache-2.0.16.  There is a newer version in CVS, but the patch
> should still apply cleanly.
> 
> But first, some notes:
> 
> The current method takes the "file" part of r->filename (either the
> bit after the final / in the URI, or the value of DirectoryIndex).
> First, if the exact filename matches, mod_negotiation declines to
> handle it.  Second, for each file in the directory, it tries to match
> /^file\./.
> 
> This patched method does an extra strchr(), and uses a few extra
> int's and char *'s; and then for the requested file "file" does the
> same thing.
> 
> However, if the r->filename is actually "file.s1.s2.sZ" (with dots),
> the current way looks for /^file\.s1\.s2\.sZ\./; the patched way looks
> for each of /^file\./, /\.s1/, /\.s2/, /\.sZ/.  It bails out at the
> first failure.
> 
> Extra pointer and string manipulation is needed to do this, per dot in
> the requested file name, per file in the directory.
> 
> Some consequences of this implementation are:
> 
> Current method: file "name.html.en" is only accessible through
> (partial) URIs "name", "name.html", or "name.html.en"
> 
> Patched method: The same three work, as do "name.en" and
> "name.en.html".  That is good.  However: so do "name.htm",
> "name.htm.en", and "name.en.htm".  That may be considered good.  More
> however: so do "name.h", "name.h.h", "name...h.e.e..e.h.h.", and an
> infinite number of similar variations.  That may not be considered
> good.
> 
> In fact, the infinite number of possibilities is limited by the
> requirement that the length of the file name must be at least the
> length of the request in order to be considered, so a request with a
> dozen trailing dots will only have the hit of many strstr()s for files
> that match the prefix and have long enough names.
> 
> In each case, the content is returned with a Content-Location: header
> indicating the canonical filename.
> 
> The requirements are (1)r->filename up to the first dot must match the
> real filename up to the first dot; (2)r->filename may not be longer
> than the real filename; (3)each .suffix in r->filename must exist
> (string match) in the real filename; (4)the real filename must
> correspond to a known mime-type, encoding, etc -- which I think means
> that the final suffix must be known, and only suffixes followed by
> known suffixes are considered.
> 
> As a real example, testing with the apache "It worked!" page (named
> index.html.LANG), if I request index.html.fr, I get the page back.  If
> I request index.fr.html, or just index.fr, I get back the 406 Not
> Acceptable page, with a link to index.html.fr, _unless_ I include fr
> as an acceptable language.  If I include fr as a language, I can
> request /index.fr, /index.fr.html, or /index.html.fr successfully.  If
> I include fr as my preferred language, I can additionally request /,
> /index, and /index.html.  (As well as the .h, .ht, .htm, .f variants
> referred to earlier).  If I request /index.d, I get a 406 with links
> to index.html.de and index.html.dk
> 
> As a faked example, consider five files in the DocumentRoot, with no
> special customisations to the (MIME) configuration:
> 
> files a.b.c, d.e.html, g.h.i.j.k.en, m.n.o.p.q.html, s.t.html.u.v
> 
> The following requests have the indicated results:
> 
> GET /a            -> not found
> GET /a.b          -> not found
> GET /a.c          -> not found
> GET /a.b.c        -> success
> GET /d            -> success
> GET /d.e          -> success
> GET /d.h          -> success
> GET /d.html       -> success
> GET /d....html    -> not found
> GET /g            -> not found
> GET /g.h          -> not found
> GET /g.h.i.j.k    -> not found
> GET /g.h.i.j.k.en -> success
> GET /g.h.i.k.j.en -> not found
> GET /m            -> success
> GET /m.html       -> success
> GET /m.o.q.p.n    -> success
> GET /m.o.r.p.n    -> not found
> GET /s.t.html.u.v -> success
> GET /s            -> not found
> GET /s.t.html.u   -> not found
> 
> note that in the "not found" cases there (except for /m.o.r.p.n and
> /d....html), the patched code does pass the file down as being
> potentially valid -- it's later code which decides that it doesn't
> know how to treat the final suffix, and fails it.
> 
> As another faked example, with files ..d.f.html and .e.txt, I can
> successfully issue GETs for /.d, /.f, /.h, /.e and /.t, as well as
> things like /....t. (whether or not the final . there is punctuation). 
> 
> So that's it.  If I've missed something obvious, like r->filename being
> read-only or something, I'll head back to the drawing board.


----- Original Message ----- 
From: "William A. Rowe, Jr." <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, October 02, 2001 8:33 PM
Subject: Re: [PATCH] mod_negotiation, suffix order

> Francis,
> 
>   I don't see any of my earlier replies on this topic.  I think I may
> have confused your contribution with a post by Brian Pane.  In any case,
> 
>   I am very impressed by this idea for Apache 2.0.  But I don't like the
> many to many mapping.  If we change your underlying rule here to require
> that each filename extension is passed in sequence, I would be _very_ 
> happy to commit this patch :)  E.g. index.en _could_ match index.html.en.
> But index.en.html would _not_ match index.html.en.


----- Original Message ----- 
From: "Rodent of Unusual Size" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, October 03, 2001 8:54 AM
Subject: Re: .asis handler isn't driven


> "William A. Rowe, Jr." wrote:
> > 
> > [There is a weakness.  We need to evaluate the exception
> > list by component, right now we simply strcmp.  There is
> > a note in status to that effect.  E.g. requesting index.bak
> > -should- match index.html.bak
> 
> Um, no, I definitely think not.  I think the portion of
> the filename that's specified in the URL should be
> considered opaque, and that we can only negociate using
> the bits that are tailed on the file names but not the
> URL.

This post didn't mean what you expected [see my reply on the subject
.asis handler isn't driven], but your intepretation is relevant to this
thread here.
 
> That is, if the URI is index.bak, we can only negociate
> amongst variants matching index.bak* -- NOT index.*.bak*.



What's your rational?  I agree that index[.*].bak[.*] is broader
than index.bak[.*] --- but I'm wondering why you feel this way?

Say that we want to point the user to the english index page.
Why shouldn't a request for index.en discover index.html.en or
index.cgi.en?

This would resolve a _major_ Headache (with a capital H) on Win32,
since we can't handle index.html.en by filename extension.  However,
anyone could read the document win_service.en.html by double-clicking
a local copy of that file.  

The historical problem has been ordering, since we know the index page
will summon win_service.html.  Because the wildcards can only tail the
filename, we cannot server win_service.en.html from that request.

I've really got problems with the attitude that "Well, that's win32's
brokenness, to hell with letting them double click on the docs ... they
aught to know how to start the server before they read the docs."  That's
pretty bogus.  Contrawise, I don't disagree with allowing index.html to
find index.html.en or index.en.html, and not breaking anyone.

I'm arguing against a many-to-many, but not against allowing the parser
to test for unspecified segments between the filename and last given
extension.  The CPU hit will be negligable for mismatches, and only
slightly larger for matches.  

Bill

Re: [PATCH] mod_negotiation, suffix order

Reply via email to