Re: [squid-users] Tuning for very expensive bandwidth links
On 02/04/11 11:46, Ed W wrote: Hi So the remote (client) side proxy would need an eCAP plugin that would modify the initial request to include an ETag. This would require some ability to interrogate what we have in cache and generate/request the ETag associated with what we have already - do you have a pointer to any API/code that I would need to look at to do this? I'm unsure sorry. Alex at The Measurement Factory has better info on specific details of what the eCAP API can do. If I wanted to hack on Squid 3.2... Do you have a 60 second overview on the code points to examine with a view to basically: a) create an etag and insert the relevant header on any response content (although, perhaps done only in the case that an etag is not provided by upstream server) StoreEntry would be the starting point. Currently everything goes through it. In future it will be just cacheable stuff, but the bypassed things will be useless adding a ETag for caching anyway. Other than that my knowledge of the store system is patchy. Alex and Henrik know a lot more about the inner cache workings than me. b) add an etag header to requests (without one) - ie we are looking at the case that client 2 requests content we have cached, but client 2 doesn't know that, only local squid does. http.cc does all the request relaying outward stuff. I believe the if-modified-since requests Squid sends should have ETag in them (if one is known either from client or from local cache copy). If you find otherwise that is probably a bug worth fixing. Double-check with RFC 2616 though. There is no way we can add ETag to requests clients send before looking up the local cache. The local cache starts with URL and whatever results that produces is them checked for Vary: match. ETag sent by the server might be worth adding as an implicit prefix to the Vary: pieces. For matching against ETag sent by the client. BUT is of little use until multiple-variant caching is ported to 3.x from 2.7. Just looking for a quick heads up on where to start investigating? mentioned above. IIRC we have Dimitry with The Measurement Factory assisting with HTTP compliance fixes. I'm sure sponsorship towards a specific fix will be welcomed. How do I get in contact with Dimitry? Alex is his supervisor I think. rousskov at squid-cache.org. content might have been removed..? Seems that at least parts of this might need to be done internally to squid? Just to be clear, the point is that few web servers generate useful etags, and under the condition that bandwidth is the limiting constraint (plus a hierarchy of proxies), then it might be useful to generate (and later test) etags based on some consistent hash algorithm? Yes. We came to that conclusion too. You will find it a bit tricky (but mostly possible) to insert live, but getting it into the cached items should be relatively easy. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.11 Beta testers wanted for 3.2.0.5
Re: [squid-users] Tuning for very expensive bandwidth links
Hi >> So the remote (client) side proxy would need an eCAP plugin that would >> modify the initial request to include an ETag. This would require some >> ability to interrogate what we have in cache and generate/request the >> ETag associated with what we have already - do you have a pointer to any >> API/code that I would need to look at to do this? > > I'm unsure sorry. Alex at The Measurement Factory has better info on > specific details of what the eCAP API can do. If I wanted to hack on Squid 3.2... Do you have a 60 second overview on the code points to examine with a view to basically: a) create an etag and insert the relevant header on any response content (although, perhaps done only in the case that an etag is not provided by upstream server) b) add an etag header to requests (without one) - ie we are looking at the case that client 2 requests content we have cached, but client 2 doesn't know that, only local squid does. Just looking for a quick heads up on where to start investigating? > IIRC we have Dimitry with The Measurement Factory assisting with HTTP > compliance fixes. I'm sure sponsorship towards a specific fix will be > welcomed. How do I get in contact with Dimitry? > The one public eCAP adapter we have bee notified about happens to be for > doing gzip. http://code.google.com/p/squid-ecap-gzip/ Hmm.. I did already look this over a bit - very nice and simple API, shame there aren't a huge bunch of ecap plugins sprung up? The limitation seems to be that the API is really around mangling requests/responses, but there isn't obviously a way to interrogate squid and ask it questions about what it's caching? Even if there were then you also have a race condition that you might say to upstream that we have content "X" in cache, but by the time the response comes back that content might have been removed..? Seems that at least parts of this might need to be done internally to squid? Just to be clear, the point is that few web servers generate useful etags, and under the condition that bandwidth is the limiting constraint (plus a hierarchy of proxies), then it might be useful to generate (and later test) etags based on some consistent hash algorithm? Thanks Ed W
Re: [squid-users] Tuning for very expensive bandwidth links
On 01/04/11 12:09, Ed W wrote: Hi My thought was to investigate having the internet side proxy add etag headers to all content based on some quality hash function. Then have the (expensive) remote side proxy rewrite the request headers to always use If-None-Match? The idea is that the bandwidth is cheap on internet connected side, so it can refresh it's cache of the whole page, generate a new hash, but still return a "not modified" response if the end result is the same string of bytes. How much of that can I implement in Squid 3.x today..? 3.1.10+ will validate If-None-Match and ETag, but will not add them to requests itself. Thanks - can you expand on what it means to "validate" in this case? I think you mean that if the content is cached with a given eTag then requests for that content will be returned from cache if the request has an appropriate If-None-Match - is this the case? I mean Squid will produce 412 or 304 replies using those headers in the HTTP/1.1 if-modified-since and if-none-match checking algorithms to reduce bandwidth. So you will not have to alter the receiving Squid to meet your needs. Only the sending one. With modern browsers adding those headers on their own you may not even have to add them. Note, I realise this could lead to some side effects where the action of visiting the web page itself causes some other side effect, however, I think this is a manageable problem for this requirement? Thanks for any pointers to ideas or other products that might help? ICAP or eCAP would be the way to go here for quick results. Making a plugin to do the ETag generation and alterations before sending off. Understood. So the remote (client) side proxy would need an eCAP plugin that would modify the initial request to include an ETag. This would require some ability to interrogate what we have in cache and generate/request the ETag associated with what we have already - do you have a pointer to any API/code that I would need to look at to do this? I'm unsure sorry. Alex at The Measurement Factory has better info on specific details of what the eCAP API can do. Then on the internet side proxy we would do whatever we need to retrieve the content, say fetch the asset. Then our eCap on that side would generate a consistent ETag using our favourite hash function? Yes. I'd start with that side of the link and see if the modern browsers ETag support plays well and eliminates the need for the client-side eCAP trouble. The part I'm unsure how to implement would be examining what's in squid's cache in order to generate an ETag based on what we have got (ie for remote side)? Me too. You could also look at cutting bodies off 304 replies at the Internet side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses. Hmm, yes that would be very sensible. Apart from via eCAP are there other ways I might do that? Not currently. NP: if you want to go ahead and alter Squid code adding If-None-Match on outbound requests is an open bug. As is proper ETag variant caching support. I don't know if I have the time/ability to hack on squid code? Is there someone who might be interested on working on this for an affordable fee? IIRC we have Dimitry with The Measurement Factory assisting with HTTP compliance fixes. I'm sure sponsorship towards a specific fix will be welcomed. Thanks for the very helpful feedback. Note if there are any existing ecap/icap modules I should look at then please educate me? (I'm currently using "Ziproxy" and looking at moving the interesting bits to a Squid ecap module. I have also used "Rabbit" proxy which is somewhat similar) The one public eCAP adapter we have bee notified about happens to be for doing gzip. http://code.google.com/p/squid-ecap-gzip/ Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.11 Beta testers wanted for 3.2.0.5
Re: [squid-users] Tuning for very expensive bandwidth links
Hi >> My thought was to investigate having the internet side proxy add etag >> headers to all content based on some quality hash function. Then have >> the (expensive) remote side proxy rewrite the request headers to always >> use If-None-Match? The idea is that the bandwidth is cheap on internet >> connected side, so it can refresh it's cache of the whole page, generate >> a new hash, but still return a "not modified" response if the end result >> is the same string of bytes. How much of that can I implement in Squid >> 3.x today..? > > 3.1.10+ will validate If-None-Match and ETag, but will not add them to > requests itself. Thanks - can you expand on what it means to "validate" in this case? I think you mean that if the content is cached with a given eTag then requests for that content will be returned from cache if the request has an appropriate If-None-Match - is this the case? >> Note, I realise this could lead to some side effects where the action of >> visiting the web page itself causes some other side effect, however, I >> think this is a manageable problem for this requirement? >> >> Thanks for any pointers to ideas or other products that might help? > > ICAP or eCAP would be the way to go here for quick results. Making a > plugin to do the ETag generation and alterations before sending off. Understood. So the remote (client) side proxy would need an eCAP plugin that would modify the initial request to include an ETag. This would require some ability to interrogate what we have in cache and generate/request the ETag associated with what we have already - do you have a pointer to any API/code that I would need to look at to do this? Then on the internet side proxy we would do whatever we need to retrieve the content, say fetch the asset. Then our eCap on that side would generate a consistent ETag using our favourite hash function? The part I'm unsure how to implement would be examining what's in squid's cache in order to generate an ETag based on what we have got (ie for remote side)? > You could also look at cutting bodies off 304 replies at the Internet > side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses. Hmm, yes that would be very sensible. Apart from via eCAP are there other ways I might do that? > NP: if you want to go ahead and alter Squid code adding If-None-Match on > outbound requests is an open bug. As is proper ETag variant caching > support. I don't know if I have the time/ability to hack on squid code? Is there someone who might be interested on working on this for an affordable fee? Thanks for the very helpful feedback. Note if there are any existing ecap/icap modules I should look at then please educate me? (I'm currently using "Ziproxy" and looking at moving the interesting bits to a Squid ecap module. I have also used "Rabbit" proxy which is somewhat similar) Thanks for your comments Ed W
Re: [squid-users] Tuning for very expensive bandwidth links
On 30/03/2011 19:17, Marcus Kool wrote: > If your users do not mind, you can block ads and user tracking > sites of which many produce 1x1 gifs. > Most ads and tracking codes are not cacheable and may consume a lot. > This all depends on which sites your users visit of course. Thanks - got that covered to some extent. So far I was looking at these two lists for a simple domain blocking system to catch "adverts and tracking" http://www.mvps.org/winhelp2002/hosts.htm http://hosts-file.net/ Any other suggestions/comments? Additionally we will offer the option to do image recompression and upstream gzip of content (actually probably we will use our own compressing tunnel across the slow link) Anyone know of any already written ecap/icap servers I might want to investigate? Cheers Ed W
Re: [squid-users] Tuning for very expensive bandwidth links
If your users do not mind, you can block ads and user tracking sites of which many produce 1x1 gifs. Most ads and tracking codes are not cacheable and may consume a lot. This all depends on which sites your users visit of course. Marcus Amos Jeffries wrote: On 31/03/11 01:38, Ed W wrote: Hi, Just investigating some tuning for squid for use with satellite links (which are relatively slow + bandwidth can be charged at $10-100/MB) I'm pondering having a dual proxy configuration with a proxy at both ends of the satellite link. A desired goal would be to force serving from local cache anything which hasn't actually changed (byte for byte) on the internet side. My thought was to investigate having the internet side proxy add etag headers to all content based on some quality hash function. Then have the (expensive) remote side proxy rewrite the request headers to always use If-None-Match? The idea is that the bandwidth is cheap on internet connected side, so it can refresh it's cache of the whole page, generate a new hash, but still return a "not modified" response if the end result is the same string of bytes. How much of that can I implement in Squid 3.x today..? 3.1.10+ will validate If-None-Match and ETag, but will not add them to requests itself. Note, I realise this could lead to some side effects where the action of visiting the web page itself causes some other side effect, however, I think this is a manageable problem for this requirement? Thanks for any pointers to ideas or other products that might help? ICAP or eCAP would be the way to go here for quick results. Making a plugin to do the ETag generation and alterations before sending off. You could also look at cutting bodies off 304 replies at the Internet side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses. NP: if you want to go ahead and alter Squid code adding If-None-Match on outbound requests is an open bug. As is proper ETag variant caching support. Amos
Re: [squid-users] Tuning for very expensive bandwidth links
On 31/03/11 01:38, Ed W wrote: Hi, Just investigating some tuning for squid for use with satellite links (which are relatively slow + bandwidth can be charged at $10-100/MB) I'm pondering having a dual proxy configuration with a proxy at both ends of the satellite link. A desired goal would be to force serving from local cache anything which hasn't actually changed (byte for byte) on the internet side. My thought was to investigate having the internet side proxy add etag headers to all content based on some quality hash function. Then have the (expensive) remote side proxy rewrite the request headers to always use If-None-Match? The idea is that the bandwidth is cheap on internet connected side, so it can refresh it's cache of the whole page, generate a new hash, but still return a "not modified" response if the end result is the same string of bytes. How much of that can I implement in Squid 3.x today..? 3.1.10+ will validate If-None-Match and ETag, but will not add them to requests itself. Note, I realise this could lead to some side effects where the action of visiting the web page itself causes some other side effect, however, I think this is a manageable problem for this requirement? Thanks for any pointers to ideas or other products that might help? ICAP or eCAP would be the way to go here for quick results. Making a plugin to do the ETag generation and alterations before sending off. You could also look at cutting bodies off 304 replies at the Internet side to avoid the bandwidth expensive TCP_REFRESH_UNMODIFIED responses. NP: if you want to go ahead and alter Squid code adding If-None-Match on outbound requests is an open bug. As is proper ETag variant caching support. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.11 Beta testers wanted for 3.2.0.5
[squid-users] Tuning for very expensive bandwidth links
Hi, Just investigating some tuning for squid for use with satellite links (which are relatively slow + bandwidth can be charged at $10-100/MB) I'm pondering having a dual proxy configuration with a proxy at both ends of the satellite link. A desired goal would be to force serving from local cache anything which hasn't actually changed (byte for byte) on the internet side. My thought was to investigate having the internet side proxy add etag headers to all content based on some quality hash function. Then have the (expensive) remote side proxy rewrite the request headers to always use If-None-Match? The idea is that the bandwidth is cheap on internet connected side, so it can refresh it's cache of the whole page, generate a new hash, but still return a "not modified" response if the end result is the same string of bytes. How much of that can I implement in Squid 3.x today..? Note, I realise this could lead to some side effects where the action of visiting the web page itself causes some other side effect, however, I think this is a manageable problem for this requirement? Thanks for any pointers to ideas or other products that might help? Ed W