Stefan Egli created OAK-3238: -------------------------------- Summary: fine tune clock-sync check vs lease-check settings Key: OAK-3238 URL: https://issues.apache.org/jira/browse/OAK-3238 Project: Jackrabbit Oak Issue Type: Improvement Components: core Affects Versions: 1.3.4 Reporter: Stefan Egli
There are now two components that try to assure 'discovery-lite' (OAK-2844) is reporting a coherent cluster view to the upper layers: * OAK-2682 : time difference detection: by default fails if clock is off by more than 2 seconds at startup. That results in a 4 sec max margin in a document-cluster * OAK-2739 : lease-checking: every instance checks if the local lease is valid upon any document access. This check is done against the actual 'leaseEndTime' - which is updated every (by default) 30 seconds to be valid for (by default) another 60 seconds. These two factors combined, in the worst case you could still end up having that 4 second time window where the local instance fails to update the lease (eg lease-thread dies) but it considers itself still owning a valid lease - while a remote instance might be those 4 seconds off and considers the lease as timed out. So overall: the 3 factors 'lease duration', 'lease update frequency' and 'maximum allowed clock difference' must be better tuned to end up in a stable mechanism. Suggestion: * increase the 'lease duration' to be 3 x 'lease update frequency', ie 90sec lease duration * reduce the lease check failure limit from 'lease duration' to 2x 'lease update frequency' - assuming that one 'lease update interval' is way larger than the 'maximum allowed clock difference' -- This message was sent by Atlassian JIRA (v6.3.4#6332)