Very good analysis, Suresh. I agree with you that we should keep this
simple and not introduce a lot of overhead to determine the constants.
--
Øystein
Suresh Thalamati wrote:
My two cents:
If the goal is do to auto-tune the checkpoint interval , I think amount
of log generated is a good indication of how fast the
system is. By finding out how fast log is getting generated,
one can predict with in reasonable error how long the recovery
will take on that system.
It might be worth to find a simple solution to start with, than trying
to get it perfect. How about some simple solution to tune the
checkpoint interval like the following:
Say on a particular system:
1) we can generate N amount of log in X amount time on a running system.
2) R is fraction of time it takes to recover a log generated in X amount
of time. One can pre calculate this factor by doing some tests instead
of trying to find on each boot.
3) Y is the recovery time, derby users likely to see by
default in the worst case.
4) C is after how much log generation a checkpoint should be scheduled
ideal checkpoint interval log size should be some thing like :
C = (N / X) * (Y * R)
For example :
X = 5 min, N = 50 MB., Y = 1 min, R = 0.5
C = (50 / 5 ) * ( 1 / 0.5) = 20 MB.
One could use some formula like above at checkpoint to tune when should
the next checkpoint should occur , I understand first time it will be
completely Off, but the interval will stabilize after a few checkpoints.
I think irrespective of whatever approach is finally implemented, we
have to watch for overhead introduced to get it exactly right, I don't
think user will really care if recovery takes few seconds
less or more.
Aside from tuning the checkpoint interval, I think we should find ways
to minimize the effect of checkpoint on the system throughput.
I guess that is a different topic.
Thanks
-suresht