RE: [squid-users] Squid dstdomain ACL

2003-12-12 Thread Henrik Nordstrom
On Fri, 12 Dec 2003, Mike McCall wrote:

> Thanks Duane.  Unfortunately, my domains list is HUGE (~600,000 domains) and
> the cache already runs at 50-95% CPU during the day, most of which I assume
> is due to the huge domains list.  If I were to lose the dstdomain ACL and
> only use url_regex, would performance stay where it is?  Sadly, I can't use
> the second option you mention because google's cache is useful for other
> non-offensive websites.

Ouch.. such large regex list will give a significant performance hit.

You could extend Squid with a special acl type for dstdomain matches to 
google cache lookups. This should allow to keep the speed the same as 
using dstdomain.

Regards
Henrik



RE: [squid-users] Squid dstdomain ACL

2003-12-12 Thread Duane Wessels
> Thanks Duane.  Unfortunately, my domains list is HUGE (~600,000 domains) and
> the cache already runs at 50-95% CPU during the day, most of which I assume
> is due to the huge domains list.  If I were to lose the dstdomain ACL and
> only use url_regex, would performance stay where it is?  Sadly, I can't use
> the second option you mention because google's cache is useful for other
> non-offensive websites.

Switching from dstdomain to url_regex will likely be much less
efficient.  dstdomain searching is probably O(log N), while url_regex
searching is O(N).

There are some redirectors (like Squirm, Jersed, and squidGuard) that claim
to be very fast and efficient.  You might be able to do regex searching with
them faster than with Squid's internal implementation.  A nice thing about
redirectors, too, is that you can test them separately before you configure
Squid to use them.

Duane W.


RE: [squid-users] Squid dstdomain ACL

2003-12-12 Thread Mike McCall
> On Fri, 12 Dec 2003, Mike McCall wrote:
> 
> > All,
> >
> > I have a fairly busy cache using native squid ACLs to block 
> access to 
> > certain sites using the dstdomain ACL type.  This is fine 
> for denying 
> > access to sites like www.playboy.com, but doesn't work when 
> people use 
> > google's cache of pages and google images, since the domain becomes 
> > www.google.com.
> >
> > My question; is there an ACL that will deny both 
> > http://www.playboy.com and 
> > http://www.google.com/search?q=cache:www.playboy.com/?
> >
> > I know regexes might be able to do this, but will there be a 
> > performance hit?
> 
> You have (at least) two options:
> 
> 1) use the 'url_regex' type to block hostnames that appear 
> anywhere in the URL, like:
> 
>  acl foo url_regex www.playboy.com
> 
>The "performance hit" depends on the size of your regex 
> list and the load on
>Squid.  If Squid is not currently running at, say mor than 
> 50% of CPU usage,
>you'll probably be fine.
> 
> 
> 2) Use a similar ACL to block all google cache queries:
> 
>  acl foo url_regex google.com.*cache:
> 
> Duane W.

Thanks Duane.  Unfortunately, my domains list is HUGE (~600,000 domains) and
the cache already runs at 50-95% CPU during the day, most of which I assume
is due to the huge domains list.  If I were to lose the dstdomain ACL and
only use url_regex, would performance stay where it is?  Sadly, I can't use
the second option you mention because google's cache is useful for other
non-offensive websites.

Mike






Re: [squid-users] Squid dstdomain ACL

2003-12-12 Thread Duane Wessels



On Fri, 12 Dec 2003, Mike McCall wrote:

> All,
>
> I have a fairly busy cache using native squid ACLs to block access to
> certain sites using the dstdomain ACL type.  This is fine for denying access
> to sites like www.playboy.com, but doesn't work when people use google's
> cache of pages and google images, since the domain becomes www.google.com.
>
> My question; is there an ACL that will deny both
> http://www.playboy.com and
> http://www.google.com/search?q=cache:www.playboy.com/?
>
> I know regexes might be able to do this, but will there be a performance
> hit?

You have (at least) two options:

1) use the 'url_regex' type to block hostnames that appear anywhere in the URL, like:

 acl foo url_regex www.playboy.com

   The "performance hit" depends on the size of your regex list and the load on
   Squid.  If Squid is not currently running at, say mor than 50% of CPU usage,
   you'll probably be fine.


2) Use a similar ACL to block all google cache queries:

 acl foo url_regex google.com.*cache:

Duane W.


[squid-users] Squid dstdomain ACL

2003-12-12 Thread Mike McCall
All,

I have a fairly busy cache using native squid ACLs to block access to
certain sites using the dstdomain ACL type.  This is fine for denying access
to sites like www.playboy.com, but doesn't work when people use google's
cache of pages and google images, since the domain becomes www.google.com.

My question; is there an ACL that will deny both
http://www.playboy.com and
http://www.google.com/search?q=cache:www.playboy.com/?

I know regexes might be able to do this, but will there be a performance
hit?

Thanks.

Mike