Christoph Gaffga wrote:
Berin Loritsch wrote:

As to the good enough vs. perfect issue, caching partial pipelines (i.e.
the results of a generator, each transformer, and the final result) will
prove to be an inadequate way to improve system performance.


I think caching parts of a pipeline ist a very smart way of optimizing the
cache. One Example:
  - Complex generator (ttl 12h)
  - Transformation (expensive)
  - CIncludeTransformer (cheap in terms of CPU usage,
      includes perheaps something like static header and
      the time of the day). One of the included source is dynamic
      (the time of the day) and has an time to live of one minute.
  - serializer

So the complete pipeline has a ttl of 1 minute, but it makes more sense to
cache the generation and transfomation for 12h instead the complete pipeline
for 1 minute.
And I think, as I understand Stefanos ideas, his cache would adapt to such a
situation (Knowing that the CPU time to save could be maximized be also
caching the first part of the pipeline (if the cache agent makes the
experience that the component is accessed more than one in 12h)).

Again, I respectfully disagree. Partial pipeline caching IMO has proven my points. Consider the same scenario above. Transformation, while computationally expensive, still is less cost than serialization due to the blocking nature of the serializer. Generators are the part of the pipeline that are most likely to alter the contents of a resource, which means the entire pipeline will have to be re-evaluated anyway.

In the instance of the CIncludeTransformer, to include the dynamic time of
day is a bad example.  What if I give the illusion of dynamics by using
JavaScript for the same purpose?  I have the same dynamics, yet I don't have
the overhead of invalidating my cache.

However because that is an implementation detail that not all developers
think of, even if we regenerate the entire pipeline every minute, that can
be done by an asynchronous process while the cache serves up the old content
while the new content is being generated.  This is how we update the sitemap.

Practically speaking, most pipelines in my applications take less than 100ms
to generate including database access and serialization.  In fact, most take
between 20-60 ms depending on the complexity of the pipeline.  If your
generation times are taking much longer than that, then you really need to
look at it.  That figure incorporates a complex generator and up to five
transformers.  I'd say that is impressive.  Add a cache and the results are
returned in 0-20 ms depending on the load of the machine.  As the machine
gets heavily loaded, generation may take longer than a second but cached
resources remain at least within 10% of the full generation of the resource.




For this reason, providing a generic cache that works on whole resources

is


a much more efficient use of time.


doesn't make more sense then, just to run squid infront of cocoon?

Does squid allow you to cache user objects? No. By generic cache I mean making it available to cache your data objects used in your site as well as your generated pages.


* We do not have accurate enough tools to determine the cost of any

particular


component in the pipeline.


I think, to measure the time for any component/pipeline is quite difficult.
It is allways affected by the system load.

That's fine. System load should be accounted for. What I am speaking of is the inability to correctly determine the cost of any one particular stage.

In your example above, the profiler as it is written now will return a set
of results of what it has measured.  The problem is that the total processing
time does not match all those results added together.  It isn't even close.

Until you have that ability to correctly determine the metric, then you have
no way to correctly determine the cost and your adaptive cache will start
making the wrong decisions.


--


"They that give up essential liberty to obtain a little temporary safety
 deserve neither liberty nor safety."
                - Benjamin Franklin



Reply via email to