On Wed, Apr 21, 2021 at 06:48:20PM +0200, Tim Düsterhus wrote:
> Willy,
> 
> On 4/21/21 12:11 PM, Willy Tarreau wrote:
> > > For the existing path ones I'd suggest:
> > > 
> > >    http-request normalize-uri filesystem
> > > 
> > > that combines path-strip-dot, path-strip-dotdot, path-merge-slashes in an
> > > useful order.
> > 
> > The only thing is that "filesystem" doesn't imply at all that it applies
> > to the path component of the URI. It could very well be about FS-like
> > parts in the query string for example. Instead I think that since the path
> > component is well defined, its name should at least be part of the action
> > (not sure if we'd want multiple actions once we can adhere to the standard).
> > 
> 
> While the path-strip-dot and path-strip-dotdot normalizers are indeed
> defined in RFC 3986, section 6.2.2.3, the path-merge-slashes normalization
> does not appear in RFC 3986 (or I might have missed it).
> 
> Testing with cURL confirms this: cURL normalizes '/./' and '/../' by itself
> unless --path-as-is is given. However duplicate slashes are not being
> normalized. This is especially interesting for empty path segments in
> combination with '/../'.

I'm not that much surprised, as I remember that some tools used to rely
on the double slash (or sometimes /./) as a delimiter. Some FTP servers
used to use it in the user's home directory as the location to chroot to
for example. So it's possible that there were such use cases on servers
as well. The other possibility is that they've been kept for the sole
purpose of referencing a "node" on certain distributed systems, where you
could have "//node/path/".

> > $ curl -v https://example.com/foo///bar 2>&1 |grep GET
> > > GET /foo///bar HTTP/2
> > $ curl -v https://example.com/foo/../bar 2>&1 |grep GET
> > > GET /bar HTTP/2
> > $ curl -v https://example.com/foo//../bar 2>&1 |grep GET
> > > GET /foo/bar HTTP/2
> 
> I'm fine with prefixing 'path-' (i.e. 'path-filesystem'). Simply 'path'
> might be misleading, because it includes non-standard normalization.

I agree that path alone could be confusing or misleading, at least because
it looks like the similar sample-fetch (and we must absolutely avoid having
conflicting names between sample fetch functions and converters otherwise
we will never be able to merge them). Why not "path-normalize" ? It seems
to describe exactly what it tries to do.

By the way be careful about RFC3986, as I remember that there is an
algorithm or an illustration program there explaining how to resolve
paths, and that contains a bug (like every single time code is offered
in RFCs). The ABNF was correct however. It might be mentioned in the
errata but I really don't remember the details, most likely something
related to the inability to resolve a missing component and causing a
".." to point to the wrong place.

Cheers,
Willy

Reply via email to