[squid-users] Evaluating traffic for caching benefits.

2009-06-01 Thread Ray Van Dolson
Any suggestions on how to go about evaluating web traffic for
cacheability?  I have access to a port that can see all the web
traffic in our company.

I'd like to be able to gauge how many hits there are to common sites to
get a feel for how much bandwidth savings we could potentially gain by
implementing a company-wide web cache.

I suppose creative use of tcpdump could be used here (obviously not
catching https traffic), but maybe there's a more polished tool or some
slicker way to do this.

Thanks,
Ray


Re: [squid-users] Re: Cacheboy-1.1 release, testers wanted!

2008-06-07 Thread Ray Van Dolson
On Sat, Jun 07, 2008 at 01:32:31PM -0700, Linda W wrote:
 So I guess, how did cacheboy come to be and why is it here (which
 may become obvious if I know the connection to the 'main' squid
 project...)?  :-?

This thread may be of some interest to you:

  http://marc.info/?t=12108954741r=1w=2

Also, I believe the Cacheboy website has some info as well.  From a
user perspective I wish all hands would work on one branch (2.7 or
3.x), but I understand that's not how things always work.

Ray


[squid-users] Don't want to start a flamewar, but...

2008-05-15 Thread Ray Van Dolson
I am using the 2.7.HEAD branch -- specifically for the Store URL
rewrite options added by Adrian.  This makes 3.0 currently not an
option for me.

I'm curious what the long-term plans are for the 2.x branch and the 3.x
branch.  Will the store URL stuff be ported to 3.x eventually?  Adrian,
you have begun work on a Cacheboy project which I understand is a fork
of 2.7.HEAD.  Will you keep work you do on one in sync with the other?
Will you no longer work on 2.7.HEAD in favor of Cacheboy?

As an end user, just a little confused about the long-term direction of
all these various versions. :)

Thanks,
Ray


[squid-users] YouTube and other streaming media (caching)

2008-04-16 Thread Ray Van Dolson
Hello all, I'm beginning to implement a Squid setup and am in
particular looking to cache Youtube as it is a significant chunk of our
traffic and we don't want to outright block it (yet).

I'm using squid-2.6.STABLE6 from RHEL 5.1 (latest errata).  I've been
reading around a lot and am seeking a bit of clarification on the
current status of caching youtue and potentially other streaming media.
Specifically:

  * Adrian mentions support for Youtube caching in 2.7 -- which seems
to correspond with this changeset:
  
  http://www.squid-cache.org/Versions/v2/2.7/changesets/11905.patch

Which would seem to be only a configuration file change.  Is there
any reason Youtube caching won't work correctly in my 2.6 version
with a similar setup (and the rewriting script as well I guess)?

  * If there are additional changes to 2.7 codebase that make youtube
caching possible, are they insignificant enough that they could
easily be backported to 2.6?  I'm trying to decide how I will
convince Red Hat to incorporate this as I doubt they'll want to
move to 2.7.  Alternate of course is to build from source which I
am open to.

My config file is as follows:

  http_port 3128
  append_domain .esri.com
  acl apache rep_header Server ^Apache
  broken_vary_encoding allow apache
  maximum_object_size 4194240 KB
  maximum_object_size_in_memory 1024 KB
  access_log /var/log/squid/access.log squid
  refresh_pattern ^ftp:   144020% 10080
  refresh_pattern ^gopher:14400%  1440
  refresh_pattern .   0   20% 4320

  acl all src 0.0.0.0/0.0.0.0
  acl esri src 10.0.0.0/255.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl to_localhost dst 127.0.0.0/8
  acl SSL_ports port 443
  acl Safe_ports port 80  # http
  acl Safe_ports port 21  # ftp
  acl Safe_ports port 443 # https
  acl Safe_ports port 70  # gopher
  acl Safe_ports port 210 # wais
  acl Safe_ports port 1025-65535  # unregistered ports
  acl Safe_ports port 280 # http-mgmt
  acl Safe_ports port 488 # gss-http
  acl Safe_ports port 591 # filemaker
  acl Safe_ports port 777 # multiling http
  acl CONNECT method CONNECT
  # Some Youtube ACL's
  acl youtube dstdomain .youtube.com .googlevideo.com .video.google.com 
.video.google.com.au
  acl youtubeip dst 74.125.15.0/24 64.15.0.0/16
  cache allow youtube
  cache allow youtubeip
  cache allow esri

  http_access allow manager localhost
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  http_access allow localhost
  http_access allow esri
  http_access deny all
  http_reply_access allow all
  icp_access allow all
  coredump_dir /var/spool/squid

  # YouTube options.
  refresh_pattern -i \.flv$ 10080 90% 99 ignore-no-cache override-expire 
ignore-private
  quick_abort_min -1 KB

  # This will block other streaming media.  Maybe we don't want this, but using
  # it for now.
  hierarchy_stoplist cgi-bin ?
  acl QUERY urlpath_regex cgi-bin \?
  cache deny QUERY

I see logfile entries (and cached objects) that indicate my youtube
videos are being saved to disk.  However they are never HIT even when
the same server is used.  I wonder if the refresh_pattern needs to be
updated?  The GET requests for the video do not have a .flv in their
filename What does refresh_pattern search for a match?  The request
URL?  The resulting MIME type?

That's it for now. :)  Thanks in advance.

Ray


Re: [squid-users] YouTube and other streaming media (caching)

2008-04-16 Thread Ray Van Dolson
On Thu, Apr 17, 2008 at 08:11:51AM +0800, Adrian Chadd wrote:
 The problem with caching Youtube (and other CDN content) is that
 the same content is found at lots of different URLs/hosts. This
 unfortunately means you'll end up caching multiple copies of the
 same content and (almost!) never see hits.
 
 Squid-2.7 -should- be quite stable. I'd suggest just running it from
 source. Hopefully Henrik will find some spare time to roll 2.6.STABLE19
 and 2.7.STABLE1 soon so 2.7 will appear in distributions.

Thanks Adrian.  FYI I got this to work with 2.7 (latest) based off the
instructions you provided earlier.  Here is my final config and the
perl script used to generate the storage URL:

  http_port 3128
  append_domain .esri.com
  acl apache rep_header Server ^Apache
  broken_vary_encoding allow apache
  maximum_object_size 4194240 KB
  maximum_object_size_in_memory 1024 KB
  access_log /usr/local/squid/var/logs/access.log squid

  # Some refresh patterns including YouTube -- although YouTube probably needs 
to
  # be adjusted.
  refresh_pattern ^ftp:   144020% 10080
  refresh_pattern ^gopher:14400%  1440
  refresh_pattern -i \.flv$   10080   90% 99 ignore-no-cache 
override-expire ignore-private
  refresh_pattern ^http://sjl-v[0-9]+\.sjl\.youtube\.com 10080 90% 99 
ignore-no-cache override-expire ignore-private
  refresh_pattern get_video\?video_id 10080 90% 99 ignore-no-cache 
override-expire ignore-private
  refresh_pattern youtube\.com/get_video\? 10080 90% 99 ignore-no-cache 
override-expire ignore-private
  refresh_pattern .   0   20% 4320

  acl all src 0.0.0.0/0.0.0.0
  acl esri src 10.0.0.0/255.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl to_localhost dst 127.0.0.0/8
  acl SSL_ports port 443
  acl Safe_ports port 80  # http
  acl Safe_ports port 21  # ftp
  acl Safe_ports port 443 # https
  acl Safe_ports port 70  # gopher
  acl Safe_ports port 210 # wais
  acl Safe_ports port 1025-65535  # unregistered ports
  acl Safe_ports port 280 # http-mgmt
  acl Safe_ports port 488 # gss-http
  acl Safe_ports port 591 # filemaker
  acl Safe_ports port 777 # multiling http
  acl CONNECT method CONNECT
  # Some Youtube ACL's
  acl youtube dstdomain .youtube.com .googlevideo.com .video.google.com 
.video.google.com.au
  acl youtubeip dst 74.125.15.0/24 
  acl youtubeip dst 64.15.0.0/16
  cache allow youtube
  cache allow youtubeip
  cache allow esri

  # These are from http://wiki.squid-cache.org/Features/StoreUrlRewrite
  acl store_rewrite_list dstdomain mt.google.com mt0.google.com mt1.google.com 
mt2.google.com
  acl store_rewrite_list dstdomain mt3.google.com
  acl store_rewrite_list dstdomain kh.google.com kh0.google.com kh1.google.com 
kh2.google.com
  acl store_rewrite_list dstdomain kh3.google.com
  acl store_rewrite_list dstdomain kh.google.com.au kh0.google.com.au 
kh1.google.com.au
  acl store_rewrite_list dstdomain kh2.google.com.au kh3.google.com.au

  # This needs to be narrowed down quite a bit!
  acl store_rewrite_list dstdomain .youtube.com

  storeurl_access allow store_rewrite_list
  storeurl_access deny all

  storeurl_rewrite_program /usr/local/bin/store_url_rewrite

  http_access allow manager localhost
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  http_access allow localhost
  http_access allow esri
  http_access deny all
  http_reply_access allow all
  icp_access allow all
  coredump_dir /usr/local/squid/var/cache

  # YouTube options.
  quick_abort_min -1 KB

  # This will block other streaming media.  Maybe we don't want this, but using
  # it for now.
  hierarchy_stoplist cgi-bin ?
  acl QUERY urlpath_regex cgi-bin \?
  cache deny QUERY

And here is the store_url_rewrite script.  I added some logging:

  #!/usr/bin/perl

  use IO::File;
  use IO::Socket::INET;
  use IO::Pipe;

  $| = 1;

  $fh = new IO::File(/tmp/debug.log, a);

  $fh-print(Hello!\n);
  $fh-flush();

  while () {
  chomp;
  #print LOG Orig URL:  . $_ . \n;
  $fh-print(Orig URL:  . $_ . \n);
  if (m/kh(.*?)\.google\.com(.*?)\/(.*?) /) {
  print http://keyhole-srv.google.com; . $2 . 
.SQUIDINTERNAL/ . $3 . \n;
  # print STDERR KEYHOLE\n;
  } elsif (m/mt(.*?)\.google\.com(.*?)\/(.*?) /) {
  print http://map-srv.google.com; . $2 . .SQUIDINTERNAL/ . 
$3 . \n;
  # print STDERR MAPSRV\n;
  } elsif 
(m/^http:\/\/([A-Za-z]*?)-(.*?)\.(.*)\.youtube\.com\/get_video\?video_id=([^]+).*
 /) {
  print 
http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=; . $4 . \n;
  
$fh-print(http://video-srv.youtube.com.SQUIDINTERNAL/get_video?video_id=; . 
$4 . \n);
  $fh-flush();
  } elsif