I have seen a number of issues at client sites related to cavalier adjustments of clocks. Up to now, my response has been to simply say "don't do that", but lately it has been bugging me and it seems like there should be a better solution.
The problem scenario involves a step-wise time change on a ZK server node either forward or backwards. The issues are: - a step backwards causes all of the timeouts to be extended by the amount of the step. Thus, if you set all clocks back by an hour, no session will time out for the next hour of real-time. This is bad. - a step forward of sufficient size will cause all live session to immediately time out. To investigate solutions, I played around a bit with nanoTime and currentTimeMillis. My experiments verified that on Linux, nanoTime is, indeed, a timer and currentTimeMillis is a reference to the absolute system clock. In my test program, I use both as the system time is modified and I see stable behavior from nanoTime and the predictably goofy behavior from currentTimeMillis. My test code is at https://github.com/tdunning/timeSkew >From these tests, it seems that using nanoTime would be substantially better than using currentTimeMillis in ZK. I think that Camille brought this up a while ago, but I don't remember this going forward. Right now, ZK is very delicate in the face of clock changes and it seems that it could be very robust. Moreover, many naive admins and some experienced admins seem to have no clue about how to keep their clocks well behaved so this delicate nature causes lots of problems. Should I try to prepare a patch? One other thing that I see is that I can't find any way to cause a java process to sleep for an elapsed time. All timer related sleeps that I can find work relative to absolute time rather than intervals. The only work-around I have found is to use Thread.yield() in a polling loop which is clearly only one half step above hideous. Relative to ZK, my question is whether there any critical need anywhere in ZK for a timed sleep.
