Re: [RT] Adaptive Caching

Stefano Mazzocchi Mon, 21 Jul 2003 16:36:59 -0700

On Friday, Jul 18, 2003, at 13:03 America/Guayaquil, Hunsberger, Peter wrote:

Let me try this a different way: one of the design decisions driving the use of SAX over DOM is that SAX is more memory efficient. However, if you're caching SAX event streams this is no longer true (assuming the SAX data structures and DOM data structures are more less equivalent in size). Thus, caching calls into question the whole way in which parsing and transformation work: if you're going to cache, why not cache something which is directly useful to the parsing and transformation stage instead of the output? It's a bit of a radical thought because in a way you're no longer assembling pipelines. Rather, you're pushing data into a database and pulling data out of the database. The database just happens to work as a cache at the same time as it works for storing parser output. Since it's a database the contents can be normalized (saving space) since it feeds transformers directly it saves parsing overhead (saving CPU). (Recall the discussion we had on push vs. pull parsing and lazy evaluation.)

Ok, I think I got it: instead of saving the output, you want to save the 'internal state' of the producer that is generating that output.

The only way to implement this would be to force the pipeline components to output two different things, the output (that goes into the pipeline) and a serializable form of their internal state (that goes into the cache)... also, provide a method for the cache to 'push' a cached internal state to allow them to recreate their operation.

If I understood correctly, I really don't see the advantage.

1) Since you aren't tracking a history of events there is no
relationship to Fourier transforms and sampling periods, they're not
relevant.

? I don't see why.

My point is, put it in simple down-to-earth terms: if the behavior of the system changes faster than we can measure it, we'll never converge.

This is the basic of all retroactive systems and the reason why B2 pilots can't fly that airplane without computer control: their reactivity is too slow. Piloting, there, means "suggesting the general tendency of directionality", the rest is done by the computer.

One way to explain this is using fourier analysis, which I tend to prefer over the Z-transform used in discrete domains, or Laplace analysis, but it's just personal taste.

Only if you mapped a specific load to a particular cost would a period apply (and then it would be in relationship to loading and not time!). Creating a map of load to cost for each fragment producer would be possible, but how do you measure "load" in a meaningful way that can be extrapolated to gingival producer behavior over global load? I don't think you can without consuming a lot of resources...

you don't need to measure load. all I said is that the algorithm doesn't work if the system behavior changes faster than the cache can adapt to it. During *very* lazy calculations on observable behaviors, I think that the adaptive cache converges for resources that are requested once every 3 hours or more frequently.

2) I do think you may want some randomness. However, at this point I don't think it applies. Consider a simple system with say 5 resources. The resources are each initially found to have a cost of production of .1, .2 , .3, etc. respectively. At this point the cache is full (must be Cocoon running on a Palm Pilot) so we trash the lowest cost item. Now the system starts to get busy and the cost of recreating it starts to rise monotonically until it reaches .2. Now we use some randomness (assuming all other factors equal) to decide which guy to toss. Upon re-evaluating the .2 item it is found to have exponentially rising cost of production and now costs .4 to recreate. The system continues to load up and eventually the .2 guy costs .3 to cost. Again some randomness occurs and we choose the original .3 guy to retry and it still costs .3 to recreate and it eventually becomes the lowest cost item to recreate under load (the original .1 guy bumps up to .301 or whatever).

At this point we have found the cost of producing the items under load. If the system load now starts to decrease we have to decide whether the cost under load is still useful or we need to re-evaluate the cost under reduced load. The only way we can re-evaluate it is if we arbitrarily force it out of cache. At first it would seem that since the system is under reduced load there are resource available to process the item that has the lowest cost at load even if it is not the lowest cost with reduced load. However, this does not guarantee the best global response times under reduced load. One can argue (I believe prove) that if you pick your cost functions properly that you will arrive at best global response time. This means the question is how to pick the cost function.

I really don't see how you can apply probabilistic behavior without randomness.

And in case you don't want probabilistic behavior, I don't see how you can create a stable system.

First examine the case where we have a less than optimal cost function; if it has enough of an impact it will push the load up and force a reevaluation of other items. Eventually one will either settle into a steady state or be thrashing (A little background: I helped write memory paging algorithms for the OS in my distant youth.) If you are thrashing one of two things is possible: you have so few resources for the given load that nothing you do will help OR alternately, your cost evaluation function is really poor. So, we see everything comes back to picking the right cost algorithm. This is the NP hard packing problem (in another guise). As a result, this is why it makes some sense to allow for the possibility of introducing randomness at this point. We can use Monte Carlo like simulation to make up for picking less than optimal cost functions.

I really don't see why moving randomness from the choice stage to the cost evaluation stage can improve converging of the system (rather the opposite, I would say)

Two important results:

1) It only makes sense to fall back to introducing randomness under
conditions of less than full load or if we are thrashing.

??? I saw no proof of this at all.

Both of these
can be hard to determine, thrashing cycles can be long and involved.

??? like "what entry is the one with the smaller efficiency?" c'mon.

Under full load or near full load re-evaluation will be forced in any
case (if resources are impacted enough for it to matter).

I don't get it.

2) Using randomness is one way of evaluating cost.  If you have a
function that produces good cost results then adding the randomness
doesn't help.  In other words, a Monte Carlo like behavior can be in
itself a cost evaluation function.

I don't get it either.

I think I'm missing your point entirely. Sorry for being so slow, but I'm really trying hard!

--
Stefano.

Re: [RT] Adaptive Caching

Reply via email to