[ https://issues.apache.org/jira/browse/OAK-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomek Rękawek updated OAK-3865: ------------------------------- Attachment: diagram.png > New strategy to optimize secondary reads > ---------------------------------------- > > Key: OAK-3865 > URL: https://issues.apache.org/jira/browse/OAK-3865 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk > Reporter: Tomek Rękawek > Fix For: 1.4 > > Attachments: diagram.png > > > *Introduction* > In the current trunk we'll only read document _D_ from the secondary instance > if: > (1) we have the parent _P_ of document _D_ cached and > (2) the parent hasn't been modified in 6 hours. > The OAK-2106 tried to optimise (2) by estimating lag using MongoDB replica > stats. It was unreliable, so the second approach was to read the last > revisions directly from each Mongo instance. If the modification date of _P_ > is before last revisions on all secondary Mongos, then secondary can be used. > The main problem with this approach is that we still need to have the _P_ to > be in cache. I think we need another way to optimise the secondary reading, > as right now only about 3% of requests connects to the secondary, which is > bad especially for the global-clustering case (Mongo and Oak instances across > the globe). The optimisation provided in OAK-2106 doesn't make the things > much better and may introduce some consistency issues. > *Proposal* > I had following constraints in mind preparing this: > 1. Let's assume we have a sequence of commits with revisions _R1_, _R2_ and > _R3_ modifying nodes _N1_, _N2_ and _N3_. If we already read the _N1_ from > revision _R2_ then reading from a secondary shouldn't result in getting older > revision (eg. _R1_). > 2. If an Oak instance modifies a document, then reading from a secondary > shouldn't result in getting the old version (before modification). > So, let's have two maps: > * _M1_ the most recent document revision read from the Mongo for each cluster > id, > * _M2_ the oldest last rev value for root document for each cluster id read > from all the secondary instances. > Maintaining _M1_: > For every read from the Mongo we'll check if the lastRev for some cluster id > is newer than _M1_ entry. If so, we'll update _M1_. For all writes we'll add > the saved revision id with the current cluster id in _M1_. > Maintaining _M2_: > It should be periodically updated. Such mechanism is already prepared in the > OAK-2106 patch. > The method deciding whether we can read from the secondary instance should > compare two maps. If all entries in _M2_ are newer than _M1_ it means that > the secondary instances contains at least as new repository state as we > already accessed and therefore it's safe to read from secondary. > Attached image diagram.png presents the idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)