Re: forced caching of volatile data

Miles Elam Thu, 14 Aug 2003 12:01:57 -0700

Gianugo Rabellino wrote:

Miles Elam wrote:

It would be possible to add in some code that if the resource has no validity objects and the pipeline has an expiry, it creates a dummy validity object that, if asked, always returns "invalid" but has the expires timestamp set to maintain the entry.
This was the only missing piece yet, and I wanted to tackle it when I'll be back from vacation. I like your approach, so if you feel like coding it, please go ahead. :-)

I'm running into a snag and was wondering if anyone had some wisdom to grant to me. The current behavior of caching pipelines is to aggregate the keys of all cacheable pipeline components and use them as the cache hash lookup. In the case of a pipeline that has uncacheable components but has an expiry, this scheme doesn't work: the key is incomplete or nonexistent. My first thought was to use the request URI but therein lies the snag; Any actions or selectors in use that would fundamentally alter the request would be erroneously cached. In 90% of the cases, I don't see this as a problem. But in cases like when different formats are sent to different clients for the same URI

(eg. XML with XSLT processing instruction for newest browsers and HTML to older clients) <map:pipeline type="caching"> <map:parameter name="expires" value="access plus 10 minutes"/>

    <map:match pattern="">
      <map:generate src="index.xml"/>
      <map:transform type="hypothetical_uncacheable"/>
      <map:select type="browser">
        <map:when test="ie">
          <map:transform src="proc_inst.xslt">
            <map:parameter name="stylesheet" value="index2html.xslt"/>
          </map:transform>
          <map:serialize type="xml"/>
        </map:when>
        <map:otherwise>
          <map:transform src="index2html.xslt"/>
          <map:serialize type="html"/>
        </map:otherwise>
      </map:select>
    </map:match>
  </map:pipeline>

if the cache key is simply the URI, older clients may end up with raw and, in their case, useless XML.

This can be avoided of course by some minor organization of the pipelines:

<map:pipeline type="caching">

    <map:match pattern="">
      <map:select type="browser">
        <map:when test="ie">
          <map:generate src="cocoon:/index.xml"/>
          <map:serialize type="xml"/>
        </map:when>
        <map:otherwise>
          <map:generate src="cocoon:/index.html"/>
          <map:serialize type="html"/>
        </map:otherwise>
      </map:select>
    </map:match>
  </map:pipeline>

  <map:pipeline type="caching">
    <map:parameter name="expires" value="access plus 10 minutes"/>

    <map:match pattern="index.xml">
      <map:generate src="index.xml"/>
      <map:transform type="hypothetical_uncacheable"/>
      <map:transform src="proc_inst.xslt">
        <map:parameter name="stylesheet" value="index2html.xslt"/>
      </map:transform>
      <map:serialize type="html"/>
    </map:match>

    <map:match pattern="index.html">
      <map:generate src="index.xml"/>
      <map:transform type="hypothetical_uncacheable"/>
      <map:transform src="index2html.xslt"/>
      <map:serialize type="html"/>
    </map:match>
  </map:pipeline>

This is actually pretty close to how my site organizes things. Now it seems that URIs as cache keys would work, but I can easily see where quite a few support emails for help would come from on the user list.

I think it should be done because quite a few things are not cacheable and also not up-to-the-second necessary. A view of an online discussion need not be immediate (and is commonly not immediate), but a database lookup by every reader of that discussion would be formidable. Having a centralized expiry would allow folks to avoid putting in what amounts to futile caching into each component (eg. a database transformer where it is not clear whether the cache interval should be time-based or info-based). An administrator can simply say, "This updates every hour so that my PII-300 server doesn't fall over if I get Slashdotted."

------

So my question is this: Should I

a) use full URIs (in which case somehow the URI needs to be made available to the pipeline code...which does not seem to be the case currently)

b) use some other mechanism which currently eludes me

Either way, any ideas how to implement? The key is my only problem. A cached response is the easy part.

- Miles Elam

Re: forced caching of volatile data

Reply via email to