[ https://issues.apache.org/jira/browse/OAK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745133#comment-14745133 ]
Stefan Egli commented on OAK-3398: ---------------------------------- third point done too: * only update the lease when {{state}} and {{leaseEnd}} of {{clusterNodes}} entry have remained unchanged since the last lease update (of the very same, local instance). if that's not the case that is a lease-failure as someone else in the cluster decided to treat this instance as timed-out: http://svn.apache.org/r1703129 > make lease update more robust > ----------------------------- > > Key: OAK-3398 > URL: https://issues.apache.org/jira/browse/OAK-3398 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core > Affects Versions: 1.3.6 > Reporter: Stefan Egli > Assignee: Stefan Egli > Fix For: 1.3.7 > > > With the lease check introduced in OAK-2739 (and refined to do a oak-core > stop in OAK-3397) it becomes more critical that the lease is always properly > updated (to avoid an unnecessary oak-core stop). The following issues exist > atm: > * currently the lease is valid 60sec by default, updated every 20sec, the > lease check fails with a margin of 20sec *before* it times out. this means if > the lease update thread is not operating for 20sec it will cause a stop. > that's quite a low figure probably > ** the suggestion is to increase the lease timeout to 120sec from 60sec - > update it as soon as 10sec has been eaten off it, and leave the 20sec safety > margin at the end. This would result in 90sec 'idle equals faulty' > * on a machine with heavy load it seems likely that the lease-update-thread > doesn't get scheduled timely enough - as it races for cpu against all the > other busy threads > ** the suggestion is to increase the thread priority of the lease update > thread - so if the VM supports thread priorities, that would help reduce > lease failure 'just because the cpu is too busy' > * the ClusterNodeInfo, when renewing the lease, doesn't check if the lease > has been marked as timed-out/recovering by another instance. it just > overwrites whatever is there. > ** It should, however, only update the lease when it has not yet been marked > as timed out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)