[squid-users] Re: Unable to match empty user-agent strings?

2008-10-20 Thread James Cohen
After some further testing and looking closely at the request headers
it turns out that this is failing because the User-Agent header field
isn't present (rather than it being present but empty).

Here's my workaround/solution which seems to work nicely.

acl image_leechers browser ^$
acl image_leechers browser Wget

acl has_user_agent browser ^.+$


http_access deny !has_user_agent
http_access deny image_leechers


I promise not to make a habit of just conversing with myself on this list...

2008/10/20 James Cohen [EMAIL PROTECTED]:
 Hi,

 I think I've found a bug but first wanted to double-check I wasn't
 doing anything dumb.

 In our reverse proxy setup we want to block people from leeching the
 images using Wget or similar applications. To do this we want to block
 user agents that match Wget and because lots of people use CURL or
 their own home-brew clients anything with an empty user agent string.

 I added the following acl rule:

 # Block automated processes from requesting our images
 acl image_leechers browser ^$
 acl image_leechers browser Wget

 and later on...

 http_access deny image_leechers

 Requests that contain Wget are being blocked exactly as expected by
 the proxy. Empty requests are still going through to the parent
 server:


 Request with Wget in the user agent request headers (correct behaviour)

 $ wget  -S http://images.xxx.com/preview/1134/35121981.jpg
 --11:29:45--  http://images.xxx.com/preview/1134/35121981.jpg
   = `35121981.jpg'
 Resolving images.xxx.com... 62.216.237.30
 Connecting to images.xxx.com|62.216.237.30|:80... connected.
 HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Server: squid/3.0.STABLE9
  Mime-Version: 1.0
  Date: Mon, 20 Oct 2008 10:29:45 GMT
  Content-Type: text/html
  Content-Length: 1653
  Expires: Mon, 20 Oct 2008 10:29:45 GMT
  X-Squid-Error: ERR_ACCESS_DENIED 0
  X-Cache: MISS from ws2
  Via: 1.0 ws2 (squid/3.0.STABLE9)
  Connection: close
 11:29:45 ERROR 403: Forbidden.

 And a similar request with an empty user agent string (incorrect - the
 request is being passed back to the parent where it returns a 403)

 $ wget -U  -S http://images.xxx.com/preview/1134/james.jpg
 --11:30:09--  http://images.xxx.com/preview/1134/james.jpg
   = `james.jpg'
 Resolving images.xxx.com... 62.216.237.30
 Connecting to images.xxx.com|62.216.237.30|:80... connected.
 HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Content-Type: text/html
  Content-Length: 345
  Date: Mon, 20 Oct 2008 10:30:09 GMT
  Server: lighttpd/1.4.20
  X-Cache: MISS from ws2
  Via: 1.0 ws2 (squid/3.0.STABLE9)
  Connection: close
 11:30:09 ERROR 403: Forbidden.


 Thanks,

 James



Re: [squid-users] Unable to match empty user-agent strings?

2008-10-20 Thread James Cohen
2008/10/20 Amos Jeffries [EMAIL PROTECTED]:

 It's not so much an empty string. As a completely missing header.
 Squid can only test what it has against what it checks.  If you get my
 meaning.

 I haven't tested it, but you might have better luck if you invert the test
 to allow access to okay agents and deny the rest.

 All they have to do is send -U fu and they get past the wget blocker.
 Not to mention the real browser UA are commonly known and often recommended
 for script kiddies to spoof the IE agent to get past site barriers and
 brokenness in one action.

 Amos


Thanks Amos,

I figured that out just after I'd posted my original mail.

I appreciate that the blocking is pretty weak but it seems that the
majority of the unwanted traffic is some kind of automated client not
supplying any User Agent at all.

I guess we going for the low hanging fruit, anyone who really wants
the content will be able to fetch it (by spoofing as a real user
agent) but this should way to block a bunch of it.

James


[squid-users] Unable to match empty user-agent strings?

2008-10-20 Thread James Cohen
Hi,

I think I've found a bug but first wanted to double-check I wasn't
doing anything dumb.

In our reverse proxy setup we want to block people from leeching the
images using Wget or similar applications. To do this we want to block
user agents that match Wget and because lots of people use CURL or
their own home-brew clients anything with an empty user agent string.

I added the following acl rule:

# Block automated processes from requesting our images
acl image_leechers browser ^$
acl image_leechers browser Wget

and later on...

http_access deny image_leechers

Requests that contain Wget are being blocked exactly as expected by
the proxy. Empty requests are still going through to the parent
server:


Request with Wget in the user agent request headers (correct behaviour)

$ wget  -S http://images.xxx.com/preview/1134/35121981.jpg
--11:29:45--  http://images.xxx.com/preview/1134/35121981.jpg
   = `35121981.jpg'
Resolving images.xxx.com... 62.216.237.30
Connecting to images.xxx.com|62.216.237.30|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Server: squid/3.0.STABLE9
  Mime-Version: 1.0
  Date: Mon, 20 Oct 2008 10:29:45 GMT
  Content-Type: text/html
  Content-Length: 1653
  Expires: Mon, 20 Oct 2008 10:29:45 GMT
  X-Squid-Error: ERR_ACCESS_DENIED 0
  X-Cache: MISS from ws2
  Via: 1.0 ws2 (squid/3.0.STABLE9)
  Connection: close
11:29:45 ERROR 403: Forbidden.

And a similar request with an empty user agent string (incorrect - the
request is being passed back to the parent where it returns a 403)

$ wget -U  -S http://images.xxx.com/preview/1134/james.jpg
--11:30:09--  http://images.xxx.com/preview/1134/james.jpg
   = `james.jpg'
Resolving images.xxx.com... 62.216.237.30
Connecting to images.xxx.com|62.216.237.30|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Content-Type: text/html
  Content-Length: 345
  Date: Mon, 20 Oct 2008 10:30:09 GMT
  Server: lighttpd/1.4.20
  X-Cache: MISS from ws2
  Via: 1.0 ws2 (squid/3.0.STABLE9)
  Connection: close
11:30:09 ERROR 403: Forbidden.


Thanks,

James


Re: [squid-users] Re-distributing the cache between multiple servers

2008-10-17 Thread James Cohen
Henrik/Amos,

Thanks for the replies. You're 100% correct in suggesting that we are
using proxy-only.

Thinking a little bit more now about the resilience we want to put in
place and the impact of one of the cache servers going down I can see
that running without proxy-only could be a great benefit to us.

Thanks again for your help.

James

2008/10/17 Amos Jeffries [EMAIL PROTECTED]:
 Hi,

 I have two reverse proxy servers using each other as neighbours. The
 proxy servers are load balanced (using a least connections
 algorithm) by a Netscaler upstream of them.

 A small amount of URLs account for around 50% or so of the requests.

 At the moment there's some imbalance in the hit rates on the two
 caches because I brought up server A before server B and it's holding
 the majority of the objects which make that 50% of request traffic.

 I can see that clearing/expiring both caches should result in an equal
 hit rate between the two servers.

 Is this the only way of achieving this? I'm concerned now that if I
 was to add a third server C into the cache pool it'd have an even
 lower hit rate than on A or B.

 I spent some time searching but wasn't able to find Squid
 administration for dummies ;)


 Sounds like one of the expected side effects of sibling 'proxy-only'
 setting. If squid were allowed to cache data received from their siblings
 in one of these setups, the hits would balance out naturally.

 Amos




[squid-users] Re-distributing the cache between multiple servers

2008-10-16 Thread James Cohen
Hi,

I have two reverse proxy servers using each other as neighbours. The
proxy servers are load balanced (using a least connections
algorithm) by a Netscaler upstream of them.

A small amount of URLs account for around 50% or so of the requests.

At the moment there's some imbalance in the hit rates on the two
caches because I brought up server A before server B and it's holding
the majority of the objects which make that 50% of request traffic.

I can see that clearing/expiring both caches should result in an equal
hit rate between the two servers.

Is this the only way of achieving this? I'm concerned now that if I
was to add a third server C into the cache pool it'd have an even
lower hit rate than on A or B.

I spent some time searching but wasn't able to find Squid
administration for dummies ;)

Thanks,

James