[ 
https://issues.apache.org/jira/browse/OAK-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-2844:
-----------------------------
    Attachment: OAK-2844.WIP-02.patch

Attaching another 'work in progress - 02' patch ([^OAK-2844.WIP-02.patch]) - 
which is.. work in progress .. but should show:
 * {{org.apache.jackrabbit.oak.plugins.discoverylite}}: package containing 
interfaces for upper layers that want to be informed when the clusterView 
changes or when an instance becomes inactive but still has a backlog - or 
becomes inactive without any more backlog.
 * {{org.apache.jackrabbit.oak.plugins.discoverylite.document}} containing the 
implementation of the above api based on the {{DocumentNodeStore}} (no more 
mongo dependency)
 ** {{DocumentDiscoveryLiteService}} the main guy which serves the 
{{discoverylite}} listeners mentioned as the first point. To do so however - 
unlike earlier, it now uses the {{DocumentNodeStore}}'s clusterNode collection 
- derives a {{ClusterViewDocument}} from it, that it stores in the settings 
collection (so no new collection needed anymore, just a new entry in settings).
 ** {{DiscoveryLiteListener}} is a small internal listener between 
{{DocumentNodeStore}} and {{DocumentDiscoveryLiteService}} where the latter is 
informed when a change in the clusterNodes collection is witnessed and when the 
background reads are finished. Details why that's needed in 
{{DocumentDiscoveryListeService}} javadoc.
 * notice the integration points into {{DocumentNodeStore}}:
 ** the mentioned {{DiscoveryLiteListener}} which is informed about 
end-of-background-read and change-in-clusterNodes
 ** {{ClusterNodeInfo}} which now got a new property: {{lastWrittenRootRev}}: 
that's the last root update that each node did (maybe that can be derived from 
elsewhere, then this might not be needed). But basically the idea is that the 
others need to know what to wait for when an instance crashes and is recovered. 
So they need to know what to compare the {{lastKnownRevision}} with - and that 
'what' is now suggested to be this {{lastWrittenRootRev}}. Once they 'see' 
beyond {{lastWrittenRootRev}} then they consider a node as no longer having any 
backlog.

The state of this patch:
 * manually tested: clusterView document in settings collection is properly 
updated, events are properly sent.
 * actual thorough unit tests yet to be done - hence expecting low quality atm.

Just putting this in here for anyone interested to review - I'll be off the 
next couple days so won't be able to react soon on reviews/comments/questions 
I'm afraid.

> Introducing a simple mongo-based discovery-light service (to circumvent 
> mongoMk's eventual consistency delays)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-2844
>                 URL: https://issues.apache.org/jira/browse/OAK-2844
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: mongomk
>            Reporter: Stefan Egli
>             Fix For: 1.4
>
>         Attachments: InstanceStateChangeListener.java, OAK-2844.WIP-02.patch, 
> OAK-2844.patch
>
>
> When running discovery.impl on a mongoMk-backed jcr repository, there are 
> risks of hitting problems such as described in "SLING-3432 
> pseudo-network-partitioning": this happens when a jcr-level heartbeat does 
> not reach peers within the configured heartbeat timeout - it then treats that 
> affected instance as dead, removes it from the topology, and continues with 
> the remainings, potentially electing a new leader, running the risk of 
> duplicate leaders. This happens when delays in mongoMk grow larger than the 
> (configured) heartbeat timeout. These problems ultimately are due to the 
> 'eventual consistency' nature of, not only mongoDB, but more so of mongoMk. 
> The only alternative so far is to increase the heartbeat timeout to match the 
> expected or measured delays that mongoMk can produce (under say given 
> load/performance scenarios).
> Assuming that mongoMk will always carry a risk of certain delays and a 
> maximum, reasonable (for discovery.impl timeout that is) maximum cannot be 
> guaranteed, a better solution is to provide discovery with more 'real-time' 
> like information and/or privileged access to mongoDb.
> Here's a summary of alternatives that have so far been floating around as a 
> solution to circumvent eventual consistency:
>  # expose existing (jmx) information about active 'clusterIds' - this has 
> been proposed in SLING-4603. The pros: reuse of existing functionality. The 
> cons: going via jmx, binding of exposed functionality as 'to be maintained 
> API'
>  # expose a plain mongo db/collection (via osgi injection) such that a higher 
> (sling) level discovery could directly write heartbeats there. The pros: 
> heartbeat latency would be minimal (assuming the collection is not sharded). 
> The cons: exposes a mongo db/collection potentially also to anyone else, with 
> the risk of opening up to unwanted possibilities
>  # introduce a simple 'discovery-light' API to oak which solely provides 
> information about which instances are active in a cluster. The implementation 
> of this is not exposed. The pros: no need to expose a mongoDb/collection, 
> allows any other jmx-functionality to remain unchanged. The cons: a new API 
> that must be maintained
> This ticket is about the 3rd option, about a new mongo-based discovery-light 
> service that is introduced to oak. The functionality in short:
>  * it defines a 'local instance id' that is non-persisted, ie can change at 
> each bundle activation.
>  * it defines a 'view id' that uniquely identifies a particular incarnation 
> of a 'cluster view/state' (which is: a list of active instance ids)
>  * and it defines a list of active instance ids
>  * the above attributes are passed to interested components via a listener 
> that can be registered. that listener is called whenever the discovery-light 
> notices the cluster view has changed.
> While the actual implementation could in fact be based on the existing 
> {{getActiveClusterNodes()}} {{getClusterId()}} of the 
> {{DocumentNodeStoreMBean}}, the suggestion is to not fiddle with that part, 
> as that has dependencies to other logic. But instead, the suggestion is to 
> create a dedicated, other, collection ('discovery') where heartbeats as well 
> as the currentView are stored.
> Will attach a suggestion for an initial version of this for review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to