Re: Content Filtering
Tres Seaver wrote: Henrik Nordstrom wrote: Note: Squid is GPL and as a result you are only allowed to use GPL modules with Squid. If your filter implementation is not GPL then dynamic linking is not an option. IANAL, bu wouldn't it be truer to say that the GPL does not allow *distribution* of Squid linked with software under non-GPL-compatible licenses? See: http://www.gnu.org/licenses/gpl-faq.html#TOCGPLRequireSourcePostedPublic and: http://www.gnu.org/licenses/gpl-faq.html#InternalDistribution And I am not sure that dynamic linking triggers the derivative work provisions, particularly if Squid continues to function without the presence of the library. Note particulary: http://www.gnu.org/licenses/gpl-faq.html#GPLAndPlugins and the use of the word believe in that language. The GPL itself does not mention dynamic linking at all. Interesting question, since with dynamic linking, it's the recipient of the software that creates the, er, mixed work. And since he doesn't redistribute the in-memory running image, everything might well be kosher GPLwise. Might well be a real GPL loophole here. I think I can see an argument to be made on the other side, though (e.g., it talks like linking and walks like linking...) Of course, the ultimate solution to not being sued is to make sure nobody feels like a victim of your actions -- Jon Kay pushcache.com - push done right Squid consulting / installation
Re: content-encoding: gzip, deflate
Tomasz Chmielewski wrote: Hello, Recently I noticed that Squid doesn't support gzip/deflate content encodings (yet?). I also found that there is a GPL proxy called Middleman that does support it (http://middle-man.sourceforge.net/). Perhaps it could help a bit in developing compression in Squid? There is a squid3 patch to support it in development. Swell Technologies and another partner are working on this. Swell has hired me. It's at a fairly advanced stage. There's a page on the subect, and how you can get early access and support the project, here: http://swelltech.com/squidgzip/ -- Jon Kay pushcache.com - push done right Squid consulting / installation
Re: next version of content-encoding / gzip design doc
Here's yet another design version, following many helpful suggestions from Henrik. Gzip Content-Encoding in Squid Design Version Choice The goal will be to get these changes into Squid3 HEAD. Content-Encoding Protocol The content-encoding protocol is describedi Header field cases from client: If Accept-Encoding field is present in client request If there is a cached response aleady available, and it contains a Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no cached response with right content-encoding) If uncoded response isn't available Then forward client request to server/cache If server/cache response contains Content-Encoding field Then forward new response to client Else (server/cache response doesn't have Content-Encoding) Then encode client response Send encoded response to client Else (uncoded server response already available) Then encode uncoded response Send encoded response to client Else (no Accept-Encoding in client request) If uncoded server response already available Forward unchanged to client Else if coded server response already available Then decode server response send decoded response to client Else (no response available yet) Then forward request to client or cache, and behave unchanged with respect to this protocol. There will be no explicit links between objects that are different links to the same coding. Instead, StoreKeys of coded objects will be chosen particularly as MD5(OriginalStoreKey,Content-Encoding). This would allow one to derive the StoreKeys of all possible encodings including original if only knowing the original StoreKey and not the requested URL. Searching for an uncoded version of an object is done by generating an uncoded StoreKey and looking for an object with that key. It's needed upon cache miss (see protocol above). Upon original or encoded object update or PURGE, delete all the possible encoding variants. As the encodings are applied locally the possible combinations are known and finite so there is no problem on purging all at once. If the number of encodings grows nontrivially, we may need to add an additional mechanism to keep that check under control. Original-update deletion will be triggered on swapout of a new original object (when it gets a public key). Etags: Encoded objects will be given unique new entity tags. There will be a configuration option to turn off content-encoding. Content-Encoding Implementation New HttpHdrContCode module, that parses related HTTP headers, and arranges for encoding or decoding appropriately. Includes the following functions: codeParseRequest(): Called from client_side:parseHttpRequest() after clientStreamInit() call. Checks for and parses Allow-Encoding headers. Instantiates content_coding appropriately, and calls codeClientStreamInit(). codeClientStreamInit(): Adds a new node to clientStream with codeStreamRead(), codeStreamCallback(), and codeStreamStatus() functions. codeStreamCallback()set up encoding/decoding state depending on combination of Content-Encoding and Allow-Encoding fields seen. codeStreamRead(): call HttpContentCoder transformation functions appropriately. codeStreamStatus(): report status to stream. New HttpContentCoder abstract type, with functions: encodeStart() encodeEnd() encodeChunk() decodeStart() decodeEnd() decodeChunk() New per-coded-object ContentCoderState, to handle coding state. It'll be referenced from the clientStream, and include fields: HttpContentCoder *coder off_t codedOffset Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Other changes needed: Add new content_coding field to HttpReply. New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc. A new configuration flag to turn content-encoding off, if desired. A new object flag, encoded. Whenever an encoded or decoded object is created, it's tagged as encoded. Thus, a locally redecoded object will be obviously so. A new store.cc function, storeDeleteCodedCopies(), will do the deletion of all (un)coded copies described above. Gzip A new GzipContentCoder module, which will be an instance of HttpContentCoder. Data encoding will be handled by the gzip.org a href=http://www.gzip.org/zlib/ zlib library/a. Functions: gzEncodeStart: call
Re: next version of content-encoding / gzip design doc
Henrik Nordstrom wrote: On Fri, 5 Mar 2004, Jon Kay wrote: If Accept-Encoding field is present in client request If server or cache response contains Content-Encoding field with encodings that are a subset of what the client accepts This must be relaxed to just contains a Content-Encoding field, ignoring if it is acceptable by the client. If not you run into ugly corner cases if the server ignores what the client accepts. OOPS. I misstated this test. It SHOULD be: If Accept-Encoding field is present in client request If there is a cached response aleady available, and it contains a Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no cached response with right content-encoding) ... otherwise the same.
Re: next version of content-encoding / gzip design doc
Henrik Nordstrom wrote: On Wed, 3 Mar 2004, Jon Kay wrote: Because current browser implementations treat Content-Encoding much as though it was Transfer-Encoding, we will implement Content-Encoding and Accept-Encoding as though they were actually the Transfer-Encoding and TE described in the HTTP specifications. This part I do not understand. Coontent-Encoding and Transfer-Encoding is fundamentally different in their operation far beyond the hop-by-hop vs end-to-end difference. You can not interchange one for the other. It is not safe to assume a clients accepts gzip TE only because they accept gzip content-encoding. For one thing the message format is completely different. Etags of replies encoded by Squid will be modified to turn them into weak tags if they are not already so. Why to you oppose creating new unique ETags? There will be a configuration option to turn off content-encoding. Granted, and this will default off in the standard distribution, as any other option which violates the semantically transparent HTTP proxy requirements. Content-Encoding Implementation No comments there. Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Ok. A new store_dup module will be created to manage dup store_entries and make sure duplicate entries are invalidated when a new version of an object is read. It consists of a circular list of StoreEntry pointers named dupnext and dupprev When a new duplicate encoding (or decoding) of an object is created, it's added to the list. When any StoreEntry is invalidated or updated, all dups are invalidated. Looks a little too complex to me. Wouldn't something simpler like the following work: Modify the store key to account for content encoding. Add a internal meta object listing the known content encodings of a given object. When a new encoding is added rewrite this object to add the new encoding name. On cache hits, iterate over the known acceptable encodings until a match is found in the cache. In recoded objects include a meta header indicating the identity of the original object and disregard the recoded object on a cache hit if it no longer matches the original. From what I can tell the above would also work for adding server-driven Content-Encoding negotiation to the proxy to complement the use of Vary (which most mod_gzip servers do not support btw). Regards Henrik
Re: Content-Encoding and storage forma
I think our decision not to keep just encoded versions around immunizes us from that one; I don't see how a redecoding could arise, as encoded versions follow different paths to encoding-accepting clients than decoded versions to unaccepting, purist clients. I do not quite follow what you are saying here. The issues is not about what happens within a single Squid but what happens at the clients or in a cache mesh. I was wrong. Yes, indeed, recodings can happen. If you modify the ETag to include details on how the object has been recoded then you are immune as each variant then has a different identity. Also if you use weak etags you are mostly immune to your own actions, but there is secondary caching implications where clients may get a different encoding than expected because the two are told to be semantically equivalent. So, are you suggesting that, for example, if we get an uncoded server response with ETag: page12345, then we would tag a gzip-coded version as ETag: gzippage12345? Jon
Re: next version of content-encoding / gzip design doc
Coontent-Encoding and Transfer-Encoding is fundamentally different in their operation far beyond the hop-by-hop vs end-to-end difference. You can not interchange one for the other. It is not safe to assume a clients accepts gzip TE only because they accept gzip content-encoding. For one thing the message format is completely different. Yes. I'm going to try a different tack to explanation / underpinnings. Now I'm going to outline it by case analysis: Protocol: Header field cases from client: If Accept-Encoding field is present in client request If server or cache response contains Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no helpful content-encoding field) If uncoded response isn't available Then forward client request to server/cache If server/cache response contains Content-Encoding field Then forward new response to client Add this response to duplicate list for the object Else (server/cache response doesn't have Content-Encoding) Then encode client response Add encoded response to duplicate list for the object Send encoded response to client Else (uncoded server response already available) Then encode uncoded response Add encoded response to duplicate list for the object Send encoded response to client Else (no Accept-Encoding in client request) If uncoded server response already available Forward unchanged to client Else if coded server response already available Then decode server response add decoded response to duplicate list for the object send decoded response to client Else (no response available yet) Then forward request to client or cache, and behave unchanged with respect to this protocol.
Re: Content-Encoding and storage forma
Applying Content-Encoding in an accelerator makes sense, and can be done reasonably well. Applying Content-Encoding in a general purpose Internet proxy is a different beast and you then need to be very careful. Yes, indeed. Looking at the spec, I've decided to add a squid.conf flag to turn content encoding off if desired. That seems like a good idea anyway for other reasons. A recoded object such as gzip can be regarded semantically equivalent providing the user-agent knows how to decode gzip, but are obviously not binary equivalent to the non-encoded entity. If you are 100% certain that all user-agents ever accessing contents from this server accepts gzip content-encoding then you may use the same weak ETag for both original and encoded, but if there ever is cases where clients should get the original then you must not, as if you do you instruct downstream caches the gzip and original are equivalent regardless of what the client accepts. I think our decision not to keep just encoded versions around immunizes us from that one; I don't see how a redecoding could arise, as encoded versions follow different paths to encoding-accepting clients than decoded versions to unaccepting, purist clients. Now, one troubling aspect to this is that different caches can generate different valid encodings of the same object. Can you guys think of an action path by which that could produce corrupt results? Jon
next version of content-encoding / gzip design doc
Here's a new version of the design document, that incorporates the results of your suggestions. I hope this is better... Jon Gzip Content-Encoding in Squid Design Version Choice The goal will be to get these changes into Squid3 HEAD. Content-Encoding Protocol Because current browser implementations treat Content-Encoding much as though it was Transfer-Encoding, we will implement Content-Encoding and Accept-Encoding as though they were actually the Transfer-Encoding and TE described in the HTTP specifications. Etags of replies encoded by Squid will be modified to turn them into weak tags if they are not already so. There will be a configuration option to turn off content-encoding. Content-Encoding Implementation New HttpHdrContCode module, that parses related HTTP headers, and arranges for encoding or decoding appropriately. Includes the following functions: * codeParseRequest(): Called from client_side:parseHttpRequest() after clientStreamInit() call. Checks for and parses Allow-Encoding headers. Instantiates content_coding appropriately, and calls codeClientStreamInit(). * codeClientStreamInit(): Adds a new node to clientStream with codeStreamRead(), codeStreamCallback(), and codeStreamStatus() functions. * codeStreamCallback()set up encoding/decoding state depending on combination of Content-Encoding and Allow-Encoding fields seen. * codeStreamRead(): call HttpContentCoder transformation functions appropriately. * codeStreamStatus(): report status to stream. * codeDupNode(): Alloc new store_entry and insert new clientStream dup node (see below) to (v?)copy data to store_entry as well as reply. New HttpContentCoder abstract type, with functions: * encodeStart() * encodeEnd() * encodeChunk() * decodeStart() * decodeEnd() * decodeChunk() New per-coded-object ContentCoderState, to handle coding state. It'll be referenced from the clientStream, and include fields: * HttpContentCoder *coder * off_t codedOffset Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. A new store_dup module will be created to manage dup store_entries and make sure duplicate entries are invalidated when a new version of an object is read. It consists of a circular list of StoreEntry pointers named dupnext and dupprev When a new duplicate encoding (or decoding) of an object is created, it's added to the list. When any StoreEntry is invalidated or updated, all dups are invalidated. Functions: * storeNewDup(): called from codeDupNode(), above, and creates new node with the dup'ed node attached via the dup list. * storeDupClientStreamInit(): called from codeDupNode(), and adds new clientStreamNode to copy off encoded data to new node as well as reply. * storeDupClientStreamRead(): does copying off. * storeDupClientStreamCallback(): null function * storeDupClientStreamStatus(): returns status Other changes needed: *Add new content_coding field to HttpReply. *New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc. *HttpReply:httpReplySetHeaders will weaken the etag if appropriate. *A new configuration flag to turn content-encoding off, if desired. Gzip A new GzipContentCoder module, which will be an instance of HttpContentCoder. Data encoding will be handled by the gzip.org zlib library. Functions: * gzEncodeStart: call inflateInit2(), write header * gzEncodeEnd: write trailer * gzEncodeChunk: call inflate() * gzDecodeStart: call deflateInit2(), read and verify header * gzDecodeEnd: verify trailer * gzDecodeChunk: call deflate() * gzDoSaveEncoded(): true Test Strategy Must pass the test suite. Must add appropriate tests, including sending gzipped content to oneself successfully. Will also test against Apache mod_gzip implementation, and maybe even gunzip.
Re: resend of content-encoding / gzip design
Henrik Nordstrom wrote: See programmers guide chapters on Client Streams. I think you will find this fits quite nicely for Content-Encoding requirements. Exacly where to add the logics on when to stack the content-encoding client stream pipe onto the reply path is another question. This certainly looks like it has possibilities. I'm trying to rework the design to incorporate this. There's one thing I don't understand. Where does the composition - the passing along of control in reads from node to node - happen? Jon
resend of content-encoding / gzip design
I tried to send this last night, but it got filtered because it's written in html. Here's a text translation: Gzip Content-Encoding in Squid Design Version Choice The goal will be to get these changes into Squid3 HEAD. Content-Encoding New HttpHdrContCode module, that parses related HTTP headers, and arranges for encoding or decoding appropriately. To be called from clientProcessRequest, cacheHit, and processReplyHeader. New HttpContentCoder abstract type, with functions: encodeStart(): called from HttpHdrContCode encodeEnd(): called from comm_close handler encodeChunk(): called from storeClientCopy handler decodeStart(): called from HttpStateData::processReplyHeader decodeEnd(): called from comm_close handler decodeChunk(): called from comm_read handler doSaveEncoded() New per-coded-object ContentCoderState, to handle coding state. It will include fields: HttpContentCoder *coder off_t codedOffset The HttpStateData class will have a usually nulled reference to a ContentEncoder. It will only be non-null for objects which are being encoded or decoded. Other changes needed: *Add new content_coding field to HttpReply. *New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc. Gzip A new GzipContentCoder module, which will be an instance of HttpContentCoder. Data encoding will be handled by the gzip.org zlib library. The gzip card drivers are expected to include a binary-compatible zlib library. Functions: gzEncodeStart: call inflateInit2(), write header gzEncodeEnd: write trailer gzEncodeChunk: call inflate() gzDecodeStart: call deflateInit2(), read and verify header gzDecodeEnd: verify trailer gzDecodeChunk: call deflate() gzDoSaveEncoded(): true Test Strategy Must pass the test suite. Must add appropriate tests, including sending gzipped content to oneself successfully. Will also test against Apache mod_gzip implementation, and maybe even gunzip.
generic content encoding and gzip support
Hi, there! Been a while since my last email here. I'm back to doing Squid consulting with an emphasis on push, and hope you guys are doing well. Joe Cooper is paying me to work on designing and implementing the addition to Squid of generic Content-Encoding support and a gzip Content-Encoding as a particular supported coding. Right now, I'm in the design phase. I have a first-draft design I hope people will be able to examine and criticise. One thing is missing from the draft is storage issues - I'll talk about that in a separate email. Joe would like me to merge this stuff with squid3 HEAD when it's working right. Please let me know if you guys see any problem with that. Jon