Re: Matching URLs at layer 7

2010-04-28 Thread Willy Tarreau
On Wed, Apr 28, 2010 at 06:21:34PM +0930, Andrew Commons wrote:
> As an aside, should the documentation extract below actually read:
> 
> acl local_dsthdr(Host) -i localhost
>  ^
>  ^
> i.e. is the name of the header case sensitive? In my attempts to work this
> out I think that I had to use 'Host' rather than 'host' before it worked.

no, a header name is not case-sensitive, and the hdr() directive takes
care of that for you. However a header value is case sensitive, and since
the host header holds a DNS name, which is not case sensitive, you have to
use -i to be sure to match any possible syntax a user might use.

Regards,
Willy




Re: Matching URLs at layer 7

2010-04-28 Thread Willy Tarreau
On Wed, Apr 28, 2010 at 09:21:31PM +0930, Andrew Commons wrote:
> Hi Beni,
> 
> A few things to digest here.
> 
> What was leading me up this path was a bit of elementary (and probably naïve) 
> white-listing with respect to the contents of the Host header and the URI/L 
> supplied by the user. Tools like Fiddler make request manipulation trivial so 
> filtering out 'obvious' manipulation attempts would be a good idea. With this 
> in mind my thinking (if it can be considered as such) was that:
> 
> (1) user request is for http://www.example.com/whatever
> (2) Host header is www.example.com
> (3) All is good! Pass request on to server.
> 
> Alternatively:
> 
> (1) user request is for http://www.example.com/whatever
> (2) Host header is www.whatever.com
> (3) All is NOT good! Flick request somewhere harmless.
> 
> I'm not sure whether your solution supports this, and if your interpretation 
> is correct maybe HAProxy doesn't support it either.
> 
> I'll do some more experimenting and I hope I don't lock myself out ;-)

I'm not sure what you're trying to achieve. Requests beginning with
"http://"; are normally for proxy servers, though they're also valid
on origin servers. If what you want is to explicitly match any of
those, then you must consider that HTTP/1.1 declares a requests with
a host field which does not match the one in the URL as invalid. So
in practice you could always just use the Host header as the one to
perform your switching on, and never use the URL part. You can even
decide to block any request beginning with "http://";. No browser
will send that to you anyway.

Regards,
Willy




Re: Matching URLs at layer 7

2010-04-28 Thread Benedikt Fraunhofer
Hi *,

> (2) Host header is www.example.com
> (3) All is good! Pass request on to server.
> (2) Host header is www.whatever.com
> (3) All is NOT good! Flick request somewhere harmless.

If that's all you want, you should be able to go with

 acl xxx_host hdr(Host)  -i xxx.example.com
 block if !xxx_host

, in your listen(, ...) section. But everything comes with a downside:
IMHO HTTP/1.0 doesnt require the Host header to be set so you'll be
effecitvely lock out all the HTTP/1.0 users unless you make another
rule checking for an undefined Host header (and allowing that) (or
checking for HTTP/1.0, there should be a "macro" for that.

Just my 2cent
  Beni.



Re: Matching URLs at layer 7

2010-04-28 Thread Jeffrey 'jf' Lim
On Wed, Apr 28, 2010 at 7:51 PM, Andrew Commons
 wrote:
> Hi Beni,
>
> A few things to digest here.
>
> What was leading me up this path was a bit of elementary (and probably naïve) 
> white-listing with respect to the contents of the Host header and the URI/L 
> supplied by the user. Tools like Fiddler make request manipulation trivial so 
> filtering out 'obvious' manipulation attempts would be a good idea. With this 
> in mind my thinking (if it can be considered as such) was that:
>
> (1) user request is for http://www.example.com/whatever
> (2) Host header is www.example.com
> (3) All is good! Pass request on to server.
>
> Alternatively:
>
> (1) user request is for http://www.example.com/whatever
> (2) Host header is www.whatever.com
> (3) All is NOT good! Flick request somewhere harmless.
>

Benedikt has explained this already (see his first reply). There is no
such thing. What you see as "user request" is really sent as host
header, + uri.

Also to answer another question you raised - the http specification
states that header names are case-insensitive. I dont know about
haproxy's treatment, though (i'm too lazy to delve into the code right
now - and really you can test it out to find out for urself).

-jf


--
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."
--Richard Stallman

"It's so hard to write a graphics driver that open-sourcing it would not help."
-- Andrew Fear, Software Product Manager, NVIDIA Corporation
http://kerneltrap.org/node/7228



RE: Matching URLs at layer 7

2010-04-28 Thread Andrew Commons
Hi Beni,

A few things to digest here.

What was leading me up this path was a bit of elementary (and probably naïve) 
white-listing with respect to the contents of the Host header and the URI/L 
supplied by the user. Tools like Fiddler make request manipulation trivial so 
filtering out 'obvious' manipulation attempts would be a good idea. With this 
in mind my thinking (if it can be considered as such) was that:

(1) user request is for http://www.example.com/whatever
(2) Host header is www.example.com
(3) All is good! Pass request on to server.

Alternatively:

(1) user request is for http://www.example.com/whatever
(2) Host header is www.whatever.com
(3) All is NOT good! Flick request somewhere harmless.

I'm not sure whether your solution supports this, and if your interpretation is 
correct maybe HAProxy doesn't support it either.

I'll do some more experimenting and I hope I don't lock myself out ;-)

Cheers
Andrew

-Original Message-
From: myse...@gmail.com [mailto:myse...@gmail.com] On Behalf Of Benedikt 
Fraunhofer
Sent: Wednesday, 28 April 2010 7:42 PM
To: Andrew Commons
Cc: haproxy@formilux.org
Subject: Re: Matching URLs at layer 7

Hi Andrew,

2010/4/28 Andrew Commons :

> url_beg 
>  Returns true when the URL begins with one of the strings. This can be used to
>  check whether a URL begins with a slash or with a protocol scheme.
>
> So I'm assuming that "protocol scheme" means http:// or ftp:// or whatever

I would assume that, too..
but :) reading the other matching options it looks like those only
affect the "anchoring" of the matching. Like

> url_ip 
>  Applies to the IP address specified in the absolute URI in an HTTP request.
>  It can be used to prevent access to certain resources such as local network.
>  It is useful with option "http_proxy".

yep. but watch this "http_proxy"


> url_port 
>  "http_proxy". Note that if the port is not specified in the request, port 80
>  is assumed.

same here.. This enables plain proxy mode where requests are issued
(from the client) like

 GET http://www.example.com/importantFile.txt HTTP/1.0
.

> This seems to be reinforced (I think!) by:
>
> url_dom 
>  Returns true when one of the strings is found isolated or delimited with dots
>  in the URL. This is used to perform domain name matching without the risk of
>  wrong match due to colliding prefixes. See also "url_sub".

I personally don't think so.. I guess this is just another version of
"anchoring", here
"\.$STRING\."

> If I'm suffering from a bit of 'brain fade' here just set me on the right 
> road :-) If the url_ criteria have different interpretations in terms of what 
> the 'url' is then let's find out what these are!

I currently can't give it a try as i finally managed to lock myself out, but

http://haproxy.1wt.eu/download/1.4/doc/configuration.txt

has an example that looks exactly as what you need:
---
To select a different backend for requests to static contents on the "www" site
and to every request on the "img", "video", "download" and "ftp" hosts :

   acl url_static  path_beg /static /images /img /css
   acl url_static  path_end .gif .png .jpg .css .js
   acl host_wwwhdr_beg(host) -i www
   acl host_static hdr_beg(host) -i img. video. download. ftp.

   # now use backend "static" for all static-only hosts, and for static urls
   # of host "www". Use backend "www" for the rest.
   use_backend static if host_static or host_www url_static
   use_backend wwwif host_www

---

and as "begin" really means anchoring it with "^" in a regex this
would mean that there's no host in url as this would redefine the
meaning of "begin" which should not be done :)

So you should be fine with

   acl xxx_host hdr(Host)  -i xxx.example.com
   acl xxx_url  url_beg /
   #there's already a predefined acl doing this.
   use_backend xxx if xxx_host xxx_url

if i recall your example correctly.. But you should really put
something behind the url_beg to be of any use :)

Just my 2 cent

 Beni.




Re: Matching URLs at layer 7

2010-04-28 Thread Benedikt Fraunhofer
Hi Andrew,

2010/4/28 Andrew Commons :

> url_beg 
>  Returns true when the URL begins with one of the strings. This can be used to
>  check whether a URL begins with a slash or with a protocol scheme.
>
> So I'm assuming that "protocol scheme" means http:// or ftp:// or whatever

I would assume that, too..
but :) reading the other matching options it looks like those only
affect the "anchoring" of the matching. Like

> url_ip 
>  Applies to the IP address specified in the absolute URI in an HTTP request.
>  It can be used to prevent access to certain resources such as local network.
>  It is useful with option "http_proxy".

yep. but watch this "http_proxy"


> url_port 
>  "http_proxy". Note that if the port is not specified in the request, port 80
>  is assumed.

same here.. This enables plain proxy mode where requests are issued
(from the client) like

 GET http://www.example.com/importantFile.txt HTTP/1.0
.

> This seems to be reinforced (I think!) by:
>
> url_dom 
>  Returns true when one of the strings is found isolated or delimited with dots
>  in the URL. This is used to perform domain name matching without the risk of
>  wrong match due to colliding prefixes. See also "url_sub".

I personally don't think so.. I guess this is just another version of
"anchoring", here
"\.$STRING\."

> If I'm suffering from a bit of 'brain fade' here just set me on the right 
> road :-) If the url_ criteria have different interpretations in terms of what 
> the 'url' is then let's find out what these are!

I currently can't give it a try as i finally managed to lock myself out, but

http://haproxy.1wt.eu/download/1.4/doc/configuration.txt

has an example that looks exactly as what you need:
---
To select a different backend for requests to static contents on the "www" site
and to every request on the "img", "video", "download" and "ftp" hosts :

   acl url_static  path_beg /static /images /img /css
   acl url_static  path_end .gif .png .jpg .css .js
   acl host_wwwhdr_beg(host) -i www
   acl host_static hdr_beg(host) -i img. video. download. ftp.

   # now use backend "static" for all static-only hosts, and for static urls
   # of host "www". Use backend "www" for the rest.
   use_backend static if host_static or host_www url_static
   use_backend wwwif host_www

---

and as "begin" really means anchoring it with "^" in a regex this
would mean that there's no host in url as this would redefine the
meaning of "begin" which should not be done :)

So you should be fine with

   acl xxx_host hdr(Host)  -i xxx.example.com
   acl xxx_url  url_beg /
   #there's already a predefined acl doing this.
   use_backend xxx if xxx_host xxx_url

if i recall your example correctly.. But you should really put
something behind the url_beg to be of any use :)

Just my 2 cent

 Beni.



RE: Matching URLs at layer 7

2010-04-28 Thread Andrew Commons
Hi Beni,

Thank for responding :-)

The doco states  that:

url_beg 
  Returns true when the URL begins with one of the strings. This can be used to
  check whether a URL begins with a slash or with a protocol scheme.

So I'm assuming that "protocol scheme" means http:// or ftp:// or whatever

Other parts of the documentation state that:

url_ip 
  Applies to the IP address specified in the absolute URI in an HTTP request.
  It can be used to prevent access to certain resources such as local network.
  It is useful with option "http_proxy".

url_port 
  Applies to the port specified in the absolute URI in an HTTP request. It can
  be used to prevent access to certain resources. It is useful with option
  "http_proxy". Note that if the port is not specified in the request, port 80
  is assumed.

So I've been assuming that anything starting with url_ refers to the whole user 
supplied string parameters and all...

This seems to be reinforced (I think!) by:

url_dom 
  Returns true when one of the strings is found isolated or delimited with dots
  in the URL. This is used to perform domain name matching without the risk of
  wrong match due to colliding prefixes. See also "url_sub".

Which sure looks like the host portion to me!

If I'm suffering from a bit of 'brain fade' here just set me on the right road 
:-) If the url_ criteria have different interpretations in terms of what the 
'url' is then let's find out what these are!

Cheers
Andrew

-Original Message-
From: myse...@gmail.com [mailto:myse...@gmail.com] On Behalf Of Benedikt 
Fraunhofer
Sent: Wednesday, 28 April 2010 6:23 PM
To: Andrew Commons
Cc: haproxy@formilux.org
Subject: Re: Matching URLs at layer 7

Hi *,

2010/4/28 Andrew Commons :
>acl xxx_url  url_beg-i http://xxx.example.com
>acl xxx_url  url_sub-i xxx.example.com
>acl xxx_url  url_dom-i xxx.example.com

The Url is the part of the URI without the host :)
A http request looks like

 GET /index.html HTTP/1.0
 Host: www.example.com

so you can't use url_beg to match on the host unless you somehow
construct your urls to look like
 http://www.example.com/www.example.com/
but don't do that :)

so what you want is something like chaining
acl xxx_host hdr(Host) 
acl xxx_urlbe1 url_begin /toBE1/
use_backend BE1 if xxx_host xxx_urlbe1
?

Cheers

  Beni.




Re: Matching URLs at layer 7

2010-04-28 Thread Benedikt Fraunhofer
Hi *,

2010/4/28 Andrew Commons :
>        acl xxx_url      url_beg        -i http://xxx.example.com
>        acl xxx_url      url_sub        -i xxx.example.com
>        acl xxx_url      url_dom        -i xxx.example.com

The Url is the part of the URI without the host :)
A http request looks like

 GET /index.html HTTP/1.0
 Host: www.example.com

so you can't use url_beg to match on the host unless you somehow
construct your urls to look like
 http://www.example.com/www.example.com/
but don't do that :)

so what you want is something like chaining
acl xxx_host hdr(Host) 
acl xxx_urlbe1 url_begin /toBE1/
use_backend BE1 if xxx_host xxx_urlbe1
?

Cheers

  Beni.



RE: Matching URLs at layer 7

2010-04-28 Thread Andrew Commons
As an aside, should the documentation extract below actually read:

acl local_dsthdr(Host) -i localhost
 ^
 ^
i.e. is the name of the header case sensitive? In my attempts to work this
out I think that I had to use 'Host' rather than 'host' before it worked.


4.2. Alphabetically sorted keywords reference
-

This section provides a description of each keyword and its usage.


acl   [flags] [operator]  ...
  Declare or complete an access list.
  May be used in sections :   defaults | frontend | listen | backend
 no|yes   |   yes  |   yes
  Example:
acl invalid_src  src  0.0.0.0/7 224.0.0.0/3
acl invalid_src  src_port 0:1023
acl local_dsthdr(host) -i localhost

  See section 7 about ACL usage.


-Original Message-
From: Andrew Commons [mailto:andrew.comm...@bigpond.com] 
Sent: Wednesday, 28 April 2010 4:06 PM
To: 'haproxy@formilux.org'
Subject: Matching URLs at layer 7

I'm confused over the behaviour of the url criteria in layer 7 acls. 

If I have a definition of the form:

acl xxx_host hdr(Host)  -i xxx.example.com

then something like this works fine:

use_backend xxx if xxx_host

If I try something like this:

acl xxx_url  url_beg-i http://xxx.example.com
use_backend xxx if xxx_url

then it fails.

I've tried:

acl xxx_url  url_sub-i xxx.example.com
acl xxx_url  url_dom-i xxx.example.com

Same resultI'm missing something obvious here, I just can't see it :-(

My ultimate goal is to have:

use_backend xxx if xxx_url xxx_host

which I think makes sense for a browser request that has not been fiddled
with...if I could test it I would be able to find out!

Any insights appreciated :-)

Cheers
andrew




Matching URLs at layer 7

2010-04-27 Thread Andrew Commons
I'm confused over the behaviour of the url criteria in layer 7 acls. 

If I have a definition of the form:

acl xxx_host hdr(Host)  -i xxx.example.com

then something like this works fine:

use_backend xxx if xxx_host

If I try something like this:

acl xxx_url  url_beg-i http://xxx.example.com
use_backend xxx if xxx_url

then it fails.

I've tried:

acl xxx_url  url_sub-i xxx.example.com
acl xxx_url  url_dom-i xxx.example.com

Same resultI'm missing something obvious here, I just can't see it :-(

My ultimate goal is to have:

use_backend xxx if xxx_url xxx_host

which I think makes sense for a browser request that has not been fiddled
with...if I could test it I would be able to find out!

Any insights appreciated :-)

Cheers
andrew