[squid-users] Cache digest question

2009-03-16 Thread Chris Woodfield

Hi,

I'm looking into setting up cache peering - I currently have small  
sets of reverse-proxy squids sitting behind a load balancer, with no  
URI hashing or other content-based switching in play (thanks to a nice  
bug/feature in Foundry's IOS that prevents graceful rehashing when  
new servers are added to a VIP..) So I'm looking at other ways to  
scale horizontally our cache capacity (and increase hit rates as I go)  
- so cache-peering in proxy-only mode seems to be a good solution


Due to various reasons, it's looking like cache digests are going to  
be the best way to go in our environment (Option #2 is multicast, but,  
ew). However, one big question I have is this - are cache digests  
intended to replace, or to supplement, normal ICP cache query behavior?


For example, let's say squid A and squid B exchange cache digests  
every 10 minutes. squid A has just retrieved a cache digest from squid  
B, and then gets a new request for an object one minute after the  
cache exchange. One minute later (8 minutes before the next digest  
exchange), squid A gets a request for the same URL. This object is a  
local miss to squid A, but it in-cache for squid B although it's not  
in the latest digest that squid A has received from B.


Will squid A either 1. Do a normal ICP query to squid B due to the  
fact that it's a cache miss, or 2. Presume that squid B doesn't have  
the object since it wasn't in the last digest, and retrieve it itself?  
In other words, do digest exchanges preclude ICP queries for objects  
requests that are local cache misses and are not in the most-recent  
cache digests that a squid has received?


Personally, I'm hoping the answer is #1, as #2 can easily result in  
duplicated content between the squids, which is exactly what I'm  
trying to avoid here.


Thanks,

-Chris




Re: [squid-users] Cache digest question

2009-03-16 Thread Amos Jeffries
 Hi,

 I'm looking into setting up cache peering - I currently have small
 sets of reverse-proxy squids sitting behind a load balancer, with no
 URI hashing or other content-based switching in play (thanks to a nice
 bug/feature in Foundry's IOS that prevents graceful rehashing when
 new servers are added to a VIP..) So I'm looking at other ways to
 scale horizontally our cache capacity (and increase hit rates as I go)
 - so cache-peering in proxy-only mode seems to be a good solution

 Due to various reasons, it's looking like cache digests are going to
 be the best way to go in our environment (Option #2 is multicast, but,
 ew). However, one big question I have is this - are cache digests
 intended to replace, or to supplement, normal ICP cache query behavior?

I believe it's replace. Though I may be wrong. I have not seen both in
action together yet.


 For example, let's say squid A and squid B exchange cache digests
 every 10 minutes. squid A has just retrieved a cache digest from squid
 B, and then gets a new request for an object one minute after the
 cache exchange. One minute later (8 minutes before the next digest
 exchange), squid A gets a request for the same URL. This object is a
 local miss to squid A, but it in-cache for squid B although it's not
 in the latest digest that squid A has received from B.

 Will squid A either 1. Do a normal ICP query to squid B due to the
 fact that it's a cache miss, or 2. Presume that squid B doesn't have
 the object since it wasn't in the last digest, and retrieve it itself?
 In other words, do digest exchanges preclude ICP queries for objects
 requests that are local cache misses and are not in the most-recent
 cache digests that a squid has received?

 Personally, I'm hoping the answer is #1, as #2 can easily result in
 duplicated content between the squids, which is exactly what I'm
 trying to avoid here.

2-layer CARP mesh is the 'standard' topology recommended for this since
Wikipedia had such success with it. Where the underlayer does all caching
and the load balancing Squid overlayer splits requests into to the
underlayer using CARP.

Amos




Re: [squid-users] Cache digest question

2009-03-16 Thread Chris Woodfield


On Mar 16, 2009, at 9:07 PM, Amos Jeffries wrote:


Hi,

I'm looking into setting up cache peering - I currently have small
sets of reverse-proxy squids sitting behind a load balancer, with no
URI hashing or other content-based switching in play (thanks to a  
nice

bug/feature in Foundry's IOS that prevents graceful rehashing when
new servers are added to a VIP..) So I'm looking at other ways to
scale horizontally our cache capacity (and increase hit rates as I  
go)

- so cache-peering in proxy-only mode seems to be a good solution

Due to various reasons, it's looking like cache digests are going to
be the best way to go in our environment (Option #2 is multicast,  
but,

ew). However, one big question I have is this - are cache digests
intended to replace, or to supplement, normal ICP cache query  
behavior?


I believe it's replace. Though I may be wrong. I have not seen both in
action together yet.



Answered my own question with some lab testing - cache digests are  
*supplemental* to normal ICP behavior. When receiving a URL request  
that's an internal miss, it will look up cache digests first, then do  
an ICP query, then query direct. This is the behavior I was hoping it  
would have :) Even works with multicast ICP, which was a pleasant  
surprise.


the mgr:peer_select even gives you a nice statistic as to how many  
queries were cache-digest hits vs. ICP hits:


...
Algorithm usage:
Cache Digest:2390 ( 62%)
Icp: 1457 ( 38%)
Total:   3847 (100%)




For example, let's say squid A and squid B exchange cache digests
every 10 minutes. squid A has just retrieved a cache digest from  
squid

B, and then gets a new request for an object one minute after the
cache exchange. One minute later (8 minutes before the next digest
exchange), squid A gets a request for the same URL. This object is a
local miss to squid A, but it in-cache for squid B although it's not
in the latest digest that squid A has received from B.

Will squid A either 1. Do a normal ICP query to squid B due to the
fact that it's a cache miss, or 2. Presume that squid B doesn't have
the object since it wasn't in the last digest, and retrieve it  
itself?

In other words, do digest exchanges preclude ICP queries for objects
requests that are local cache misses and are not in the most-recent
cache digests that a squid has received?

Personally, I'm hoping the answer is #1, as #2 can easily result in
duplicated content between the squids, which is exactly what I'm
trying to avoid here.


2-layer CARP mesh is the 'standard' topology recommended for this  
since
Wikipedia had such success with it. Where the underlayer does all  
caching

and the load balancing Squid overlayer splits requests into to the
underlayer using CARP.



I was really hoping I could do this with our existing load balancers,  
but Foundry boned the pony on their content-hashing functionality -  
there's no way to do a graceful hash redistribution when adding a  
new real server to the pool.



Amos