Stefano Mazzocchi wrote:


What you left as an excersize to the reader was the Cost function.


See my email to Marc for more insights.

And my response to that. I think I am starting to "get it". Having an adaptive cost function with an adaptive cache can efficiently protect against absolute constraints and promote the primary concern. In the end we might find that the memory/disk space consumption is our most precious resources--esp. if we have absolute limits imposed on us from outside. Time would always expand to the remaining proportion.


The
rules-based approach is one way of achieving that cost function declaratively.


True. But I prefer a purely statistical approach because I'm lazy and I don't want write those rules: I want to the system to find them out for me. ;-)

I am most concerned with rules imposed from outside. For example, some hosting environments will have sys admins on your back if you have runtime requirements that exceed their perceptions of optimal resource use. In those cases we don't have the luxury of a purely adaptive cache.

Explicit rules are easier to
debug and understand, also to predict.

Here I disagree.

Currently we have a very simple caching rule:


If memory consumption goes higher than a particular threshold, start purging
the cache.  This in turn increases the cache miss ratio, but as the purging
is smart enough to get rid of old entries and not recently accessed items.

Anyhoo, this is neither here nor there.  I am beginning to see how we can
impose close to absolute limits with a simple weighted cost function.


Truth be told, your search algorithm might be as efficient as they come.
It might not.

True. It has to be demostrated. But the incredible ability of bogofilter gives me hope that it is the case even for caching.

As long as we have a consistent interface for the cache controller, we can swap out search algorithms and still use our same cost function. That way we can find the optimal search algorithm for the cache. Anyhoo, this is another discussion entirely.


But do keep in mind the poor administrator/installer.  The guy who is
managing the install is more interested in the efficiency of the cache
and how Cocoon uses resources than the developer.  The user in many cases
won't know the difference because their network connection is throttling
the response, or their browser is the bottleneck.  The efficiency of the
cache will likely have the most impact on the scalability of the server.
That is my OPINION, and should not be taken as gospel.  Please do not
shoot me for holding that opinion.


Not at all. When implemented, the dynamic adaptability will be configurable and you can turn it off or plug in your own cache if you dislike it.

Still, keeping in mind the poor administrator/installer, I don't see how a rule engine can be easier to configure than a system that doesn't need anything to learn and adapting and optimizing resources as it works. by itself and with no need for supervision.

The big thing is that we need a way for the administrator/installer to set hard limits and/or the order of precedence for the cost function parameters. That way they don't have to experiment with the actual weighting values, and they will be just as adaptive as the search algorithm.

I took the 18 pages home and started to play with some code. I appreciate
the journey to get to the search algorithm you proposed. The functions
as declared in the beginning were not that efficient, and on my machine
as soon as you had more than ~410 samples for the invalid/valid cost
evaluations the function took ~10ms (the granularity of my clock) to
evaluate. The evaluation used constants for all the cost figures because
that would be the computationally cheapest way to evaluate the efficiency
of those functions.


After this exercise, to my dismay, was the better way of evaluating a
continual cost value.  That way instead of having a sample for each
request at a particular time, we only had to work with three samples.
A vastly improved search algorithm in terms of being computationally
cheap.

Hmmm, I'm very intersted on this, can you elaborate more?

I started by implementing the first algorithms for valid(r) and invalid(r), as well as the eff(r) and the evaluation for whether it was efficient to cache or not. That meant I had to create the functions for Pt(r) and Ct(r).

The functions as defined by your initial algorithm used the sum notation,
which means we needed to maintain a list of samples for the resource.  I.e.
the cost functions identified were a function of the resource and time of
request.  The more samples maintained, the less efficient the cache evaluation
became.  Now, considering I have a Windows based box, the JVM reports time
with a granularity of 10ms which is not enough for truly evaluating the
performance of the algorithm.

The better example that you had in your whitepaper had three continuous cost
values (C1, C2, and C3).  C1 was the cost for direct production without caching,
C2 was the cost of using the cached value, and C3 was the cost for production
and storage using the cache.  I.e. we had P, Pc, and l continously evaluated.
These three samples were being adaptively applied to the overall eff evaluation.

I will have to play with the improved psuedo-code to provide a better idea.


I think it's the case and today's CPU don't make it slower to use floating-point math. Still, the hyperbolic tangent function was just one solution. there is at least another family of simply polynomial functions that exhibit the same overall properties.

:) It depends on your processor architecture really. Most 32-bit based architectures like Pentium and PowerPC still lean a more strongly in the direction of integer performance. Most 64-bit based architectures like Itanium, Alpha, MIPS, etc. lean more strongly in the direction of floating point performance. In the end, I would favor something simple like tanh() as apposed to the complexity of a polynomial function. It is simply easier to read.

Perhaps seeing what you want in code would be the best thing. It would
help solidify things for us that don't do the math or are overwhelmed by
a very long whitepaper and trying to derive from it how it will practically
make our lives better. It is would help determine the cost/benefit ratio
of actually developing this thing. As it stands, any idea that takes 18
pages to explain gives the impression of a very high cost. Whether this
bears out in practice or not remains to be seen.

I can't tell you when, but I'll do this.


At the end, it might all be an accademic exercise that doesn't work IRL, I don't know. But in case it does, it would be a very different system that would provide cocoon with even more advantages over other solutions (or, at least, that's my goal).

I believe I am starting to see the light.


--

"They that give up essential liberty to obtain a little temporary safety
 deserve neither liberty nor safety."
                - Benjamin Franklin



Reply via email to