Stefan Egli created OAK-3238:
--------------------------------

             Summary: fine tune clock-sync check vs lease-check settings
                 Key: OAK-3238
                 URL: https://issues.apache.org/jira/browse/OAK-3238
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.3.4
            Reporter: Stefan Egli


There are now two components that try to assure 'discovery-lite' (OAK-2844) is 
reporting a coherent cluster view to the upper layers:
* OAK-2682 : time difference detection: by default fails if clock is off by 
more than 2 seconds at startup. That results in a 4 sec max margin in a 
document-cluster
* OAK-2739 : lease-checking: every instance checks if the local lease is valid 
upon any document access. This check is done against the actual 'leaseEndTime' 
- which is updated every (by default) 30 seconds to be valid for (by default) 
another 60 seconds.

These two factors combined, in the worst case you could still end up having 
that 4 second time window where the local instance fails to update the lease 
(eg lease-thread dies) but it considers itself still owning a valid lease - 
while a remote instance might be those 4 seconds off and considers the lease as 
timed out.

So overall: the 3 factors 'lease duration', 'lease update frequency' and 
'maximum allowed clock difference' must be better tuned to end up in a stable 
mechanism.

Suggestion:
 * increase the 'lease duration' to be 3 x 'lease update frequency', ie 90sec 
lease duration
* reduce the lease check failure limit from 'lease duration' to 2x 'lease 
update frequency' - assuming that one 'lease update interval' is way larger 
than the 'maximum allowed clock difference'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to