Re: Content Filtering

2004-09-25 Thread Jon Kay
Tres Seaver wrote:

 Henrik Nordstrom wrote:

  Note: Squid is GPL and as a result you are only allowed to use GPL
  modules with Squid. If your filter implementation is not GPL then
  dynamic linking is not an option.

 IANAL, bu wouldn't it be truer to say that the GPL does not allow
 *distribution* of Squid linked with software under non-GPL-compatible
 licenses?

 See:
http://www.gnu.org/licenses/gpl-faq.html#TOCGPLRequireSourcePostedPublic

 and:
http://www.gnu.org/licenses/gpl-faq.html#InternalDistribution

 And I am not sure that dynamic linking triggers the derivative work
 provisions, particularly if Squid continues to function without the
 presence of the library.

 Note particulary:

http://www.gnu.org/licenses/gpl-faq.html#GPLAndPlugins

 and the use of the word believe in that language.  The GPL itself does
 not mention dynamic linking at all.

Interesting question, since with dynamic linking, it's the recipient of the
software that creates the, er, mixed work.  And since he doesn't redistribute
the in-memory running image, everything might well be kosher GPLwise.
Might well be a real GPL loophole here.   I think I can see an argument to
be
made on the other side, though (e.g., it talks like linking and walks like
linking...)

Of course, the ultimate solution to not being sued is to make sure nobody
feels
like a victim of your actions

--
Jon Kay  pushcache.com - push done right
 Squid consulting / installation




Re: content-encoding: gzip, deflate

2004-07-17 Thread Jon Kay
Tomasz Chmielewski wrote:

 Hello,

 Recently I noticed that Squid doesn't support gzip/deflate content
 encodings (yet?).

 I also found that there is a GPL proxy called Middleman that does
 support it (http://middle-man.sourceforge.net/).

 Perhaps it could help a bit in developing compression in Squid?

There  is a squid3 patch to support it in development.  Swell
Technologies
and another partner are working on this.  Swell has hired me.

It's at a fairly advanced stage.

There's a page on the subect, and how you can get early access and
support the
project, here:

  http://swelltech.com/squidgzip/

--
Jon Kay  pushcache.com - push done right
 Squid consulting / installation




Re: next version of content-encoding / gzip design doc

2004-03-09 Thread Jon Kay
Here's yet another design version, following many helpful suggestions
from Henrik.


  Gzip Content-Encoding in Squid Design


Version Choice

The goal will be to get these changes into Squid3 HEAD.


Content-Encoding Protocol

The content-encoding protocol is describedi

Header field cases from client:

If Accept-Encoding field is present in client request

If there is a cached response aleady available, and it
 contains a Content-Encoding field with encodings that are a
 subset of what the client accepts

Then forward response to client unchanged

Else (no cached response with right content-encoding)

If uncoded response isn't available

Then forward client request to server/cache

If server/cache response contains Content-Encoding field

Then forward new response to client

Else (server/cache response doesn't have Content-Encoding)

Then encode client response
Send encoded response to client

Else (uncoded server response already available)

Then encode uncoded response
Send encoded response to client

Else (no Accept-Encoding in client request)

If uncoded server response already available

Forward unchanged to client

Else if coded server response already available

Then decode server response
send decoded response to client

Else (no response available yet)

Then forward request to client or cache, and behave unchanged
with respect to this protocol.

There will be no explicit links between objects that are different
links to the same coding.  Instead, StoreKeys of coded objects will be
chosen particularly as  MD5(OriginalStoreKey,Content-Encoding). This
would allow one to derive the StoreKeys of all possible encodings
including original if only knowing the original StoreKey and not the
requested URL.

Searching for an uncoded version of an object is done by generating an
uncoded StoreKey and looking for an object with that key.  It's needed
upon cache miss (see protocol above).

Upon original or encoded object update or PURGE, delete all the
possible encoding variants. As the encodings are applied locally the
possible combinations are known and finite so there is no problem on
purging all at once.  If the number of encodings grows nontrivially,
we may need to add an additional mechanism to keep that check under
control.

Original-update deletion will be triggered on swapout of a new
original object (when it gets a public key).

Etags: Encoded objects will be given unique new entity tags.

There will be a configuration option to turn off content-encoding.


Content-Encoding Implementation

New HttpHdrContCode module, that parses related HTTP headers, and
arranges for encoding or decoding appropriately.  Includes the
following functions:

  codeParseRequest(): Called from client_side:parseHttpRequest()
  after clientStreamInit() call.  Checks for and parses Allow-Encoding
  headers.  Instantiates content_coding appropriately, and calls
  codeClientStreamInit().
  codeClientStreamInit():  Adds a new node to clientStream with
  codeStreamRead(),  codeStreamCallback(), and codeStreamStatus() functions.
  codeStreamCallback()set up encoding/decoding state depending on
  combination of Content-Encoding and Allow-Encoding fields seen.
  codeStreamRead(): call HttpContentCoder transformation functions
  appropriately.
  codeStreamStatus(): report status to stream.


New HttpContentCoder abstract type, with functions:

  encodeStart()
  encodeEnd()
  encodeChunk()

  decodeStart()
  decodeEnd()
  decodeChunk()


New per-coded-object ContentCoderState, to handle coding state.  It'll
be referenced from the clientStream, and include fields:

  HttpContentCoder *coder
  off_t codedOffset


Objects will be stored both in unencoded and encoded formats.  An
object will stay in the format in which Squid receives it until
requested by a client requesting a different Content-Encoding which
Squid supports (this could be immediate).  Once this happens, the
object will be streamed coded into a different StoreEntry and on to
the client.


Other changes needed:

Add new content_coding field to HttpReply.

New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc.

A new configuration flag to turn content-encoding off, if desired.

A new object flag, encoded.  Whenever an encoded or decoded object
is created, it's tagged as encoded.  Thus, a locally redecoded
object will be obviously so.

A new store.cc function, storeDeleteCodedCopies(), will do the
deletion of all (un)coded copies described above.


Gzip

A new GzipContentCoder module, which will be an instance of
HttpContentCoder.

Data encoding will be handled by the gzip.org a
href=http://www.gzip.org/zlib/ zlib library/a.

Functions:
  gzEncodeStart: call 

Re: next version of content-encoding / gzip design doc

2004-03-08 Thread Jon Kay
Henrik Nordstrom wrote:

 On Fri, 5 Mar 2004, Jon Kay wrote:

  If Accept-Encoding field is present in client request
 
  If server or cache response contains Content-Encoding field with
  encodings that are a subset of what the client accepts

 This must be relaxed to just contains a Content-Encoding field, ignoring
 if it is acceptable by the client. If not you run into ugly corner cases
 if the server ignores what the client accepts.


OOPS.  I misstated this test.

It SHOULD be:

If Accept-Encoding field is present in client request

If there is a cached response aleady available, and it
   contains a Content-Encoding field with encodings that are a
   subset of what the client accepts

Then forward response to client unchanged

Else (no cached response with right content-encoding)

...

otherwise the same.



Re: next version of content-encoding / gzip design doc

2004-03-04 Thread Jon Kay
Henrik Nordstrom wrote:

 On Wed, 3 Mar 2004, Jon Kay wrote:

  Because current browser implementations treat Content-Encoding much as
  though it was Transfer-Encoding, we will implement Content-Encoding and
  Accept-Encoding as though they were actually the Transfer-Encoding and
  TE described in the HTTP specifications.

 This part I do not understand.

 Coontent-Encoding and Transfer-Encoding is fundamentally different in
 their operation far beyond the hop-by-hop vs end-to-end difference. You
 can not interchange one for the other.

 It is not safe to assume a clients accepts gzip TE only because they
 accept gzip content-encoding. For one thing the message format is
 completely different.

  Etags of replies encoded by Squid will be modified to turn them into
  weak tags if they are not already so.

 Why to you oppose creating new unique ETags?

  There will be a configuration option to turn off content-encoding.

 Granted, and this will default off in the standard distribution, as any
 other option which violates the semantically transparent HTTP proxy
 requirements.

  Content-Encoding Implementation

 No comments there.

  Objects will be stored both in unencoded and encoded formats. An object
  will stay in the format in which Squid receives it until requested by a
  client requesting a different Content-Encoding which Squid supports
  (this could be immediate). Once this happens, the object will be
  streamed coded into a different StoreEntry and on to the client.

 Ok.

  A new store_dup module will be created to manage dup store_entries and
  make sure duplicate entries are invalidated when a new version of an
  object is read. It consists of a circular list of StoreEntry pointers
  named dupnext and dupprev When a new duplicate encoding (or
  decoding) of an object is created, it's added to the list. When any
  StoreEntry is invalidated or updated, all dups are invalidated.

 Looks a little too complex to me.

 Wouldn't something simpler like the following work:

 Modify the store key to account for content encoding.

 Add a internal meta object listing the known content encodings of a given
 object. When a new encoding is added rewrite this object to add the new
 encoding name.

 On cache hits, iterate over the known acceptable encodings until a match
 is found in the cache.

 In recoded objects include a meta header indicating the identity of the
 original object and disregard the recoded object on a cache hit if it no
 longer matches the original.

 From what I can tell the above would also work for adding server-driven
 Content-Encoding negotiation to the proxy to complement the use of Vary
 (which most mod_gzip servers do not support btw).

 Regards
 Henrik



Re: Content-Encoding and storage forma

2004-03-04 Thread Jon Kay

  I think our decision not to keep just encoded versions around
  immunizes us from that one; I don't see how a redecoding could arise,
  as encoded  versions follow different paths to encoding-accepting
  clients than decoded versions to unaccepting, purist clients.

 I do not quite follow what you are saying here.

 The issues is not about what happens within a single Squid but what
 happens at the clients or in a cache mesh.

I was wrong.  Yes, indeed, recodings can happen.


 If you modify the ETag to include details on how the object has been
 recoded then you are immune as each variant then has a different identity.
 Also if you use weak etags you are mostly immune to your own actions, but
 there is secondary caching implications where clients may get a different
 encoding than expected because the two are told to be semantically
 equivalent.

So, are you suggesting that, for example, if we get an uncoded server
response with ETag: page12345, then we would tag a gzip-coded
version as ETag: gzippage12345?

   Jon




Re: next version of content-encoding / gzip design doc

2004-03-04 Thread Jon Kay
 Coontent-Encoding and Transfer-Encoding is fundamentally different in
 their operation far beyond the hop-by-hop vs end-to-end difference. You
 can not interchange one for the other.

 It is not safe to assume a clients accepts gzip TE only because they
 accept gzip content-encoding. For one thing the message format is
 completely different.

Yes.  I'm going to try a different tack to explanation /
underpinnings.

Now I'm going to outline it by case analysis:

Protocol:

Header field cases from client:

If Accept-Encoding field is present in client request

If server or cache response contains Content-Encoding field with
encodings that are a subset of what the client accepts

Then forward response to client unchanged

Else (no helpful content-encoding field)

If uncoded response isn't available

Then forward client request to server/cache

If server/cache response contains Content-Encoding field

Then forward new response to client
Add this response to duplicate list for the object

Else (server/cache response doesn't have Content-Encoding)

Then encode client response
Add encoded response to duplicate list for the object
Send encoded response to client

Else (uncoded server response already available)

Then encode uncoded response
Add encoded response to duplicate list for the object
Send encoded response to client

Else (no Accept-Encoding in client request)

If uncoded server response already available

Forward unchanged to client

Else if coded server response already available

Then decode server response
add decoded response to duplicate list for the object
send decoded response to client

Else (no response available yet)

Then forward request to client or cache, and behave unchanged
with respect to this protocol.





Re: Content-Encoding and storage forma

2004-03-02 Thread Jon Kay
 Applying Content-Encoding in an accelerator makes sense, and can be done
 reasonably well. Applying Content-Encoding in a general purpose Internet
 proxy is a different beast and you then need to be very careful.

Yes, indeed.

Looking at the spec, I've decided to add a squid.conf flag to turn
content encoding off if desired.  That seems like a good idea anyway
for other reasons.

 A recoded object such as gzip can be regarded semantically equivalent
 providing the user-agent knows how to decode gzip, but are obviously not
 binary equivalent to the non-encoded entity.  If you are 100% certain that
 all user-agents ever accessing contents from this server accepts gzip
 content-encoding then you may use the same weak ETag for both original and
 encoded, but if there ever is cases where clients should get the original
 then you must not, as if you do you instruct downstream caches the gzip
 and original are equivalent regardless of what the client accepts.

I think our decision not to keep just encoded versions around
immunizes us from that one; I don't see how a redecoding could arise,
as encoded  versions follow different paths to encoding-accepting
clients than decoded versions to unaccepting, purist clients.

Now, one troubling aspect to this is that different caches can
generate different valid encodings of the same object.  Can you guys think
of an action path by which that could produce corrupt results?


Jon




next version of content-encoding / gzip design doc

2004-03-02 Thread Jon Kay
Here's a new version of the design document, that incorporates the
results of your suggestions.
I hope this is better...


Jon


Gzip Content-Encoding in Squid Design

Version Choice

The goal will be to get these changes into Squid3 HEAD.

Content-Encoding Protocol

Because current browser implementations treat Content-Encoding much as
though it was Transfer-Encoding, we will implement Content-Encoding and
Accept-Encoding as though they were actually the Transfer-Encoding and
TE
described in the HTTP specifications.

Etags of replies encoded by Squid will be modified to turn them into
weak
tags if they are not already so.

There will be a configuration option to turn off content-encoding.

Content-Encoding Implementation

New HttpHdrContCode module, that parses related HTTP headers, and
arranges
for encoding or decoding appropriately. Includes the following
functions:

   * codeParseRequest(): Called from client_side:parseHttpRequest()
 after clientStreamInit() call. Checks for and parses
 Allow-Encoding headers. Instantiates content_coding appropriately,
 and calls codeClientStreamInit().
   * codeClientStreamInit(): Adds a new node to clientStream with
 codeStreamRead(), codeStreamCallback(), and codeStreamStatus()
 functions.
   * codeStreamCallback()set up encoding/decoding state depending on
 combination of Content-Encoding and Allow-Encoding fields seen.
   * codeStreamRead(): call HttpContentCoder transformation functions
 appropriately.
   * codeStreamStatus(): report status to stream.
   * codeDupNode(): Alloc new store_entry and insert new clientStream
 dup node (see below) to (v?)copy data to store_entry as well as
 reply.

New HttpContentCoder abstract type, with functions:

   * encodeStart()
   * encodeEnd()
   * encodeChunk()
   * decodeStart()
   * decodeEnd()
   * decodeChunk()

New per-coded-object ContentCoderState, to handle coding state. It'll be

referenced from the clientStream, and include fields:

   * HttpContentCoder *coder
   * off_t codedOffset

Objects will be stored both in unencoded and encoded formats. An object
will
stay in the format in which Squid receives it until requested by a
client
requesting a different Content-Encoding which Squid supports (this could

be
immediate). Once this happens, the object will be streamed coded into a
different StoreEntry and on to the client.

A new store_dup module will be created to manage dup store_entries and
make
sure duplicate entries are invalidated when a new version of an object
is
read. It consists of a circular list of StoreEntry pointers named
dupnext
and dupprev When a new duplicate encoding (or decoding) of an object
is
created, it's added to the list. When any StoreEntry is invalidated or
updated, all dups are invalidated. Functions:

   * storeNewDup(): called from codeDupNode(), above, and creates new
 node with the dup'ed node attached via the dup list.
   * storeDupClientStreamInit(): called from codeDupNode(), and adds
 new clientStreamNode to copy off encoded data to new node as well
 as reply.
   * storeDupClientStreamRead(): does copying off.
   * storeDupClientStreamCallback(): null function
   * storeDupClientStreamStatus(): returns status

Other changes needed:
*Add new content_coding field to HttpReply.
*New httpHeaderGetContentEncoding(HttpReply *) function in
HttpHeader.cc.
*HttpReply:httpReplySetHeaders will weaken the etag if appropriate.
*A new configuration flag to turn content-encoding off, if desired.

Gzip

A new GzipContentCoder module, which will be an instance of
HttpContentCoder.

Data encoding will be handled by the gzip.org zlib library.

Functions:

   * gzEncodeStart: call inflateInit2(), write header
   * gzEncodeEnd: write trailer
   * gzEncodeChunk: call inflate()
   * gzDecodeStart: call deflateInit2(), read and verify header
   * gzDecodeEnd: verify trailer
   * gzDecodeChunk: call deflate()
   * gzDoSaveEncoded(): true

Test Strategy

Must pass the test suite.

Must add appropriate tests, including sending gzipped content to oneself

successfully.

Will also test against Apache mod_gzip implementation, and maybe even
gunzip.





Re: resend of content-encoding / gzip design

2004-02-29 Thread Jon Kay
Henrik Nordstrom wrote:

  See programmers guide chapters on Client Streams. I think you will find
  this fits quite nicely for Content-Encoding requirements. Exacly where to
  add the logics on when to stack the content-encoding client stream pipe
  onto the reply path is another question.

This certainly looks like it has possibilities.   I'm trying to rework the
design to incorporate this.

There's one thing I don't understand.  Where does the composition - the
passing along of control in
reads from node to node - happen?



Jon




resend of content-encoding / gzip design

2004-02-26 Thread Jon Kay
I tried to send this last night, but it got filtered because it's
written in html.

Here's a text translation:



Gzip Content-Encoding in Squid Design


Version Choice

The goal will be to get these changes into Squid3 HEAD.


Content-Encoding

New HttpHdrContCode module, that parses related HTTP headers, and
arranges
for encoding or decoding appropriately. To be called from
clientProcessRequest, cacheHit, and processReplyHeader.

New HttpContentCoder abstract type, with functions:

 encodeStart(): called from HttpHdrContCode
 encodeEnd(): called from comm_close handler
 encodeChunk(): called from storeClientCopy handler

 decodeStart(): called from HttpStateData::processReplyHeader
 decodeEnd(): called from comm_close handler
 decodeChunk(): called from comm_read handler

 doSaveEncoded()

New per-coded-object ContentCoderState, to handle coding state. It will
include fields:

 HttpContentCoder *coder
 off_t codedOffset

The HttpStateData class will have a usually nulled reference to a
ContentEncoder. It will only be non-null for objects which are being
encoded
or decoded.

Other changes needed:
*Add new content_coding field to HttpReply.
*New httpHeaderGetContentEncoding(HttpReply *) function in
HttpHeader.cc.


Gzip

A new GzipContentCoder module, which will be an instance of
HttpContentCoder.

Data encoding will be handled by the gzip.org zlib library. The gzip
card
drivers are expected to include a binary-compatible zlib library.

Functions:

 gzEncodeStart: call inflateInit2(), write header
 gzEncodeEnd: write trailer
 gzEncodeChunk: call inflate()

 gzDecodeStart: call deflateInit2(), read and verify header
 gzDecodeEnd: verify trailer
 gzDecodeChunk: call deflate()

 gzDoSaveEncoded(): true


Test Strategy

Must pass the test suite.

Must add appropriate tests, including sending gzipped content to oneself

successfully.

Will also test against Apache mod_gzip implementation, and maybe even
gunzip.




generic content encoding and gzip support

2004-02-25 Thread Jon Kay
Hi, there!  Been a while since my last email here.  I'm back to doing
Squid consulting with an emphasis on push, and hope you guys are doing
well.

Joe Cooper is paying me to work on designing and implementing the
addition to Squid of generic Content-Encoding support and a gzip
Content-Encoding as a particular supported coding.

Right now, I'm in the design phase.  I have a first-draft design I
hope people will be able to examine and criticise.

One thing is missing from the draft is storage issues - I'll talk
about that in a separate email.

Joe would like me to merge this stuff with squid3 HEAD when it's
working right.  Please let me know if you guys see any problem with
that.

   Jon