Re: [squid-users] Tuning for very expensive bandwidth links

2011-04-03 Thread Amos Jeffries

On 02/04/11 11:46, Ed W wrote:

Hi



So the remote (client) side proxy would need an eCAP plugin that would
modify the initial request to include an ETag.  This would require some
ability to interrogate what we have in cache and generate/request the
ETag associated with what we have already - do you have a pointer to any
API/code that I would need to look at to do this?


I'm unsure sorry. Alex at The Measurement Factory has better info on
specific details of what the eCAP API can do.


If I wanted to hack on Squid 3.2... Do you have a 60 second overview on
the code points to examine with a view to basically:

a) create an etag and insert the relevant header on any response content
(although, perhaps done only in the case that an etag is not provided by
upstream server)


StoreEntry would be the starting point. Currently everything goes 
through it. In future it will be just cacheable stuff, but the bypassed 
things will be useless adding a ETag for caching anyway.


Other than that my knowledge of the store system is patchy. Alex and 
Henrik know a lot more about the inner cache workings than me.




b) add an etag header to requests (without one) - ie we are looking at
the case that client 2 requests content we have cached, but client 2
doesn't know that, only local squid does.


http.cc does all the request relaying outward stuff. I believe the 
if-modified-since requests Squid sends should have ETag in them (if one 
is known either from client or from local cache copy). If you find 
otherwise that is probably a bug worth fixing. Double-check with RFC 
2616 though.


There is no way we can add ETag to requests clients send before looking 
up the local cache. The local cache starts with URL and whatever results 
that produces is them checked for Vary: match.
 ETag sent by the server might be worth adding as an implicit prefix to 
the Vary: pieces. For matching against ETag sent by the client.
 BUT is of little use until multiple-variant caching is ported to 3.x 
from 2.7.




Just looking for a quick heads up on where to start investigating?


mentioned above.




IIRC we have Dimitry with The Measurement Factory assisting with HTTP
compliance fixes. I'm sure sponsorship towards a specific fix will be
welcomed.


How do I get in contact with Dimitry?



Alex is his supervisor I think. rousskov at squid-cache.org.


content might have been removed..?

Seems that at least parts of this might need to be done internally to squid?

Just to be clear, the point is that few web servers generate useful
etags, and under the condition that bandwidth is the limiting constraint
(plus a hierarchy of proxies), then it might be useful to generate (and
later test) etags based on some consistent hash algorithm?



Yes. We came to that conclusion too.

You will find it a bit tricky (but mostly possible) to insert live, but 
getting it into the cached items should be relatively easy.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.11
  Beta testers wanted for 3.2.0.5


Re: [squid-users] Tuning for very expensive bandwidth links

2011-04-01 Thread Ed W
Hi


>> So the remote (client) side proxy would need an eCAP plugin that would
>> modify the initial request to include an ETag.  This would require some
>> ability to interrogate what we have in cache and generate/request the
>> ETag associated with what we have already - do you have a pointer to any
>> API/code that I would need to look at to do this?
> 
> I'm unsure sorry. Alex at The Measurement Factory has better info on
> specific details of what the eCAP API can do.

If I wanted to hack on Squid 3.2... Do you have a 60 second overview on
the code points to examine with a view to basically:

a) create an etag and insert the relevant header on any response content
(although, perhaps done only in the case that an etag is not provided by
upstream server)

b) add an etag header to requests (without one) - ie we are looking at
the case that client 2 requests content we have cached, but client 2
doesn't know that, only local squid does.

Just looking for a quick heads up on where to start investigating?


> IIRC we have Dimitry with The Measurement Factory assisting with HTTP
> compliance fixes. I'm sure sponsorship towards a specific fix will be
> welcomed.

How do I get in contact with Dimitry?


> The one public eCAP adapter we have bee notified about happens to be for
> doing gzip. http://code.google.com/p/squid-ecap-gzip/

Hmm.. I did already look this over a bit - very nice and simple API,
shame there aren't a huge bunch of ecap plugins sprung up?

The limitation seems to be that the API is really around mangling
requests/responses, but there isn't obviously a way to interrogate squid
and ask it questions about what it's caching? Even if there were then
you also have a race condition that you might say to upstream that we
have content "X" in cache, but by the time the response comes back that
content might have been removed..?

Seems that at least parts of this might need to be done internally to squid?

Just to be clear, the point is that few web servers generate useful
etags, and under the condition that bandwidth is the limiting constraint
(plus a hierarchy of proxies), then it might be useful to generate (and
later test) etags based on some consistent hash algorithm?


Thanks

Ed W


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-31 Thread Amos Jeffries

On 01/04/11 12:09, Ed W wrote:

Hi


My thought was to investigate having the internet side proxy add etag
headers to all content based on some quality hash function. Then have
the (expensive) remote side proxy rewrite the request headers to always
use If-None-Match?  The idea is that the bandwidth is cheap on internet
connected side, so it can refresh it's cache of the whole page, generate
a new hash, but still return a "not modified" response if the end result
is the same string of bytes.  How much of that can I implement in Squid
3.x today..?


3.1.10+ will validate If-None-Match and ETag, but will not add them to
requests itself.


Thanks - can you expand on what it means to "validate" in this case?

I think you mean that if the content is cached with a given eTag then
requests for that content will be returned from cache if the request has
an appropriate If-None-Match - is this the case?


I mean Squid will produce 412 or 304 replies using those headers in the 
HTTP/1.1 if-modified-since and if-none-match checking algorithms to 
reduce bandwidth.


So you will not have to alter the receiving Squid to meet your needs. 
Only the sending one. With modern browsers adding those headers on their 
own you may not even have to add them.





Note, I realise this could lead to some side effects where the action of
visiting the web page itself causes some other side effect, however, I
think this is a manageable problem for this requirement?

Thanks for any pointers to ideas or other products that might help?


ICAP or eCAP would be the way to go here for quick results. Making a
plugin to do the ETag generation and alterations before sending off.


Understood.

So the remote (client) side proxy would need an eCAP plugin that would
modify the initial request to include an ETag.  This would require some
ability to interrogate what we have in cache and generate/request the
ETag associated with what we have already - do you have a pointer to any
API/code that I would need to look at to do this?


I'm unsure sorry. Alex at The Measurement Factory has better info on 
specific details of what the eCAP API can do.




Then on the internet side proxy we would do whatever we need to retrieve
the content, say fetch the asset.  Then our eCap on that side would
generate a consistent ETag using our favourite hash function?


Yes. I'd start with that side of the link and see if the modern browsers 
ETag support plays well and eliminates the need for the client-side eCAP 
trouble.




The part I'm unsure how to implement would be examining what's in
squid's cache in order to generate an ETag based on what we have got (ie
for remote side)?



Me too.




You could also look at cutting bodies off 304 replies at the Internet
side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses.


Hmm, yes that would be very sensible.  Apart from via eCAP are there
other ways I might do that?



Not currently.




NP: if you want to go ahead and alter Squid code adding If-None-Match on
outbound requests is an open bug. As is proper ETag variant caching
support.


I don't know if I have the time/ability to hack on squid code? Is there
someone who might be interested on working on this for an affordable fee?


IIRC we have Dimitry with The Measurement Factory assisting with HTTP 
compliance fixes. I'm sure sponsorship towards a specific fix will be 
welcomed.




Thanks for the very helpful feedback. Note if there are any existing
ecap/icap modules I should look at then please educate me?  (I'm
currently using "Ziproxy" and looking at moving the interesting bits to
a Squid ecap module. I have also used "Rabbit" proxy which is somewhat
similar)


The one public eCAP adapter we have bee notified about happens to be for 
doing gzip. http://code.google.com/p/squid-ecap-gzip/


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.11
  Beta testers wanted for 3.2.0.5


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-31 Thread Ed W
Hi

>> My thought was to investigate having the internet side proxy add etag
>> headers to all content based on some quality hash function. Then have
>> the (expensive) remote side proxy rewrite the request headers to always
>> use If-None-Match?  The idea is that the bandwidth is cheap on internet
>> connected side, so it can refresh it's cache of the whole page, generate
>> a new hash, but still return a "not modified" response if the end result
>> is the same string of bytes.  How much of that can I implement in Squid
>> 3.x today..?
> 
> 3.1.10+ will validate If-None-Match and ETag, but will not add them to
> requests itself.

Thanks - can you expand on what it means to "validate" in this case?

I think you mean that if the content is cached with a given eTag then
requests for that content will be returned from cache if the request has
an appropriate If-None-Match - is this the case?


>> Note, I realise this could lead to some side effects where the action of
>> visiting the web page itself causes some other side effect, however, I
>> think this is a manageable problem for this requirement?
>>
>> Thanks for any pointers to ideas or other products that might help?
> 
> ICAP or eCAP would be the way to go here for quick results. Making a
> plugin to do the ETag generation and alterations before sending off.

Understood.

So the remote (client) side proxy would need an eCAP plugin that would
modify the initial request to include an ETag.  This would require some
ability to interrogate what we have in cache and generate/request the
ETag associated with what we have already - do you have a pointer to any
API/code that I would need to look at to do this?

Then on the internet side proxy we would do whatever we need to retrieve
the content, say fetch the asset.  Then our eCap on that side would
generate a consistent ETag using our favourite hash function?

The part I'm unsure how to implement would be examining what's in
squid's cache in order to generate an ETag based on what we have got (ie
for remote side)?


> You could also look at cutting bodies off 304 replies at the Internet
> side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses.

Hmm, yes that would be very sensible.  Apart from via eCAP are there
other ways I might do that?


> NP: if you want to go ahead and alter Squid code adding If-None-Match on
> outbound requests is an open bug. As is proper ETag variant caching
> support.

I don't know if I have the time/ability to hack on squid code? Is there
someone who might be interested on working on this for an affordable fee?

Thanks for the very helpful feedback. Note if there are any existing
ecap/icap modules I should look at then please educate me?  (I'm
currently using "Ziproxy" and looking at moving the interesting bits to
a Squid ecap module. I have also used "Rabbit" proxy which is somewhat
similar)

Thanks for your comments

Ed W


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-31 Thread Ed W
On 30/03/2011 19:17, Marcus Kool wrote:
> If your users do not mind, you can block ads and user tracking
> sites of which many produce 1x1 gifs.
> Most ads and tracking codes are not cacheable and may consume a lot.
> This all depends on which sites your users visit of course.

Thanks - got that covered to some extent.  So far I was looking at these
two lists for a simple domain blocking system to catch "adverts and
tracking"

http://www.mvps.org/winhelp2002/hosts.htm
http://hosts-file.net/

Any other suggestions/comments?

Additionally we will offer the option to do image recompression and
upstream gzip of content (actually probably we will use our own
compressing tunnel across the slow link)

Anyone know of any already written ecap/icap servers I might want to
investigate?

Cheers

Ed W


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-30 Thread Marcus Kool

If your users do not mind, you can block ads and user tracking
sites of which many produce 1x1 gifs.
Most ads and tracking codes are not cacheable and may consume a lot.
This all depends on which sites your users visit of course.

Marcus


Amos Jeffries wrote:

On 31/03/11 01:38, Ed W wrote:

Hi, Just investigating some tuning for squid for use with satellite
links (which are relatively slow + bandwidth can be charged at 
$10-100/MB)


I'm pondering having a dual proxy configuration with a proxy at both
ends of the satellite link.  A desired goal would be to force serving
from local cache anything which hasn't actually changed (byte for byte)
on the internet side.

My thought was to investigate having the internet side proxy add etag
headers to all content based on some quality hash function. Then have
the (expensive) remote side proxy rewrite the request headers to always
use If-None-Match?  The idea is that the bandwidth is cheap on internet
connected side, so it can refresh it's cache of the whole page, generate
a new hash, but still return a "not modified" response if the end result
is the same string of bytes.  How much of that can I implement in Squid
3.x today..?


3.1.10+ will validate If-None-Match and ETag, but will not add them to 
requests itself.




Note, I realise this could lead to some side effects where the action of
visiting the web page itself causes some other side effect, however, I
think this is a manageable problem for this requirement?

Thanks for any pointers to ideas or other products that might help?


ICAP or eCAP would be the way to go here for quick results. Making a 
plugin to do the ETag generation and alterations before sending off.


You could also look at cutting bodies off 304 replies at the Internet 
side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses.


NP: if you want to go ahead and alter Squid code adding If-None-Match on 
outbound requests is an open bug. As is proper ETag variant caching 
support.


Amos


Re: [squid-users] Tuning for very expensive bandwidth links

2011-03-30 Thread Amos Jeffries

On 31/03/11 01:38, Ed W wrote:

Hi, Just investigating some tuning for squid for use with satellite
links (which are relatively slow + bandwidth can be charged at $10-100/MB)

I'm pondering having a dual proxy configuration with a proxy at both
ends of the satellite link.  A desired goal would be to force serving
from local cache anything which hasn't actually changed (byte for byte)
on the internet side.

My thought was to investigate having the internet side proxy add etag
headers to all content based on some quality hash function. Then have
the (expensive) remote side proxy rewrite the request headers to always
use If-None-Match?  The idea is that the bandwidth is cheap on internet
connected side, so it can refresh it's cache of the whole page, generate
a new hash, but still return a "not modified" response if the end result
is the same string of bytes.  How much of that can I implement in Squid
3.x today..?


3.1.10+ will validate If-None-Match and ETag, but will not add them to 
requests itself.




Note, I realise this could lead to some side effects where the action of
visiting the web page itself causes some other side effect, however, I
think this is a manageable problem for this requirement?

Thanks for any pointers to ideas or other products that might help?


ICAP or eCAP would be the way to go here for quick results. Making a 
plugin to do the ETag generation and alterations before sending off.


You could also look at cutting bodies off 304 replies at the Internet 
side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses.


NP: if you want to go ahead and alter Squid code adding If-None-Match on 
outbound requests is an open bug. As is proper ETag variant caching support.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.11
  Beta testers wanted for 3.2.0.5


[squid-users] Tuning for very expensive bandwidth links

2011-03-30 Thread Ed W
Hi, Just investigating some tuning for squid for use with satellite
links (which are relatively slow + bandwidth can be charged at $10-100/MB)

I'm pondering having a dual proxy configuration with a proxy at both
ends of the satellite link.  A desired goal would be to force serving
from local cache anything which hasn't actually changed (byte for byte)
on the internet side.

My thought was to investigate having the internet side proxy add etag
headers to all content based on some quality hash function. Then have
the (expensive) remote side proxy rewrite the request headers to always
use If-None-Match?  The idea is that the bandwidth is cheap on internet
connected side, so it can refresh it's cache of the whole page, generate
a new hash, but still return a "not modified" response if the end result
is the same string of bytes.  How much of that can I implement in Squid
3.x today..?

Note, I realise this could lead to some side effects where the action of
visiting the web page itself causes some other side effect, however, I
think this is a manageable problem for this requirement?

Thanks for any pointers to ideas or other products that might help?

Ed W