[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008725#comment-15008725 ] Tomek Rękawek commented on OAK-2106: [~mreutegg], could you take a look on the patch? > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989192#comment-14989192 ] Tomek Rękawek commented on OAK-2106: After discussing with Marcel I tried a different approach, in which the Oak instance connect directly to all secondary instances and reads their _lastRev for the root. Then it's able to check if the secondary instance contains an up-to-date parent and - therefore - if the request document can be read from the nearest Mongo instance. Pull request: https://github.com/apache/jackrabbit-oak/pull/41 Patch file: https://github.com/apache/jackrabbit-oak/pull/41.diff > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976097#comment-14976097 ] Tomek Rękawek commented on OAK-2106: {quote}Let's say the estimator measures a lag of 2 seconds at time T. That is, secondaries have synced up to T-2s. At T+5s the secondaries still lag behind at T-2s.{quote} Let's have S - secondary optime, P - primery optime, T - current time. The lag is measured as S-P, not S-T. It should allow to avoid the case in which the lag is large, but we happen to measure it right after some operation has been applied. If we want to make it more reliable we can measure eg. 10 last values and return the largest one. {quote}I'm also a bit concerned about introducing a dependency from MongoDocumentStore to classes like UnmergedBranches and UnsavedModifications. I would rather like to see a solution where the client of the DocumentStore can express how fresh the document needs to be when it reads from the store.{quote} It concerns me as well (as this is some kind of circular dependency), but I wasn't able to find something better. The access to unmerged branches is necessary so we won't ask the secondary about the path belonging to branch. It doesn't depend on the time, as user may modify many nodes (which'll result in creating branch) and keep the changes unmerged for a very long time. Situation looks a bit different with the UnsavedModifications, as they are saved on a regular basis ({{asyncDelay}}) - we can add this value to the estimated lag to be sure that background update thread has run and the changes has been replicated. {quote}I would rather like to see a solution where the client of the DocumentStore can express how fresh the document needs to be when it reads from the store. I think this also means the decision whether a read can be directed to a secondary must not depend on the lag as a duration, but should rather calculate a time when it is safe to read from a secondary.{quote} We can take the {{find(maxCacheAge)}} parameter into consideration in the {{getMongoReadPreference}}, however it doesn't solve the issue with the unmerged branches. {quote}The tricky part here is how to handle time differences on the machines where the Oak cluster nodes are running and the MongoDB replica set. Each change on a document is associated with a revision, where the timestamp of the revision is tied to the local clock where the revision was created. The oplog timestamp on the other hand is derived from the primary replica set member clock, I assume.{quote} The replication set status is taken from the primary. For each secondary member we have 3 times available: * optime - secondary time of the last operation applied, * lastHeartbeat - secondary time of the last heartbeat sent, * lastHeartbeatRecv - primary time of the last heartbeat received. Primary member provides: * optime, * current timestamp. As stated above, I estimate lag by subtracting primary optime from secondary optime. These two times comes from different machines and therefore clock differences will make it less accurate. The other way of measuring the lag would be comparing lastHeartBeatRecv and current time stamp. These two times comes from the same machine (primary). It tells us how often the secondary ask for changes, but not how long does it take to apply them. Maybe the first thing is more important - if so, I can change the estimation method. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976040#comment-14976040 ] Marcel Reutegger commented on OAK-2106: --- Thanks for the patch. I think even with a single measurement the read can be directed to a secondary when it actually should go to a primary. Let's say the estimator measures a lag of 2 seconds at time T. That is, secondaries have synced up to T-2s. At T+5s the secondaries still lag behind at T-2s. Now, IIUC when a read comes in at T+5s with a request for a document with maxCacheAge of 5 seconds, the implementation would read it from a secondary, because replication lag is 2 seconds and the requested document can be 5 seconds old. I'm also a bit concerned about introducing a dependency from MongoDocumentStore to classes like UnmergedBranches and UnsavedModifications. I would rather like to see a solution where the client of the DocumentStore can express how fresh the document needs to be when it reads from the store. I think this also means the decision whether a read can be directed to a secondary must not depend on the lag as a duration, but should rather calculate a time when it is safe to read from a secondary. The tricky part here is how to handle time differences on the machines where the Oak cluster nodes are running and the MongoDB replica set. Each change on a document is associated with a revision, where the timestamp of the revision is tied to the local clock where the revision was created. The oplog timestamp on the other hand is derived from the primary replica set member clock, I assume. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974157#comment-14974157 ] Tomek Rękawek commented on OAK-2106: I guess you are right, the average may not be the best statistic, as it treats equally the information about eg. the lag from 8 measures ago and the most recent one. Probably there's something better, but right now we can stick to the latest value. I updated the code. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974132#comment-14974132 ] Vikas Saurabh commented on OAK-2106: bq. Then it computes a separate average for each SECONDARY [~tomek.rekawek], I think it's not a good idea to average out last 10 lag times - this can potentially lead to wrong results when current lag is greater than average (thus the algo would think that secondaries are more up to date than they really are) > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974024#comment-14974024 ] Tomek Rękawek commented on OAK-2106: The implementation is ready to review at [github|https://github.com/apache/jackrabbit-oak/pull/41]. The patch file can be downloaded from [here|https://github.com/apache/jackrabbit-oak/pull/41.diff]. *ReplicationLagEstimator* This new class reads the last 10 differences between optime on PRIMARY and each SECONDARY. Then it computes a separate average for each SECONDARY and returns the maxmimum lag value. In case there's something odd about the statistics (eg. the last heartbeat for the SECONDARY is before the optime on PRIMARY), the configured value of {{maxReplicationLagMillis}} will be returned (by default = 6h). There's also a new test for this. *Branch*, *Commit* I added a BloomFilter containing list of the modified paths to each *Branch* object. It's required to know these paths in order to use PRIMARY for the branch reads. *MongoDocumentStore*, *UnmergedBranchesAware*, *UnsavedModificationsAware* I created two marker interfaces, so the document node store can pass UnmergedBranches and UnsavedModifications objects back to the MongoDocumentStore. I extended the ReadPreferenceIT with two cases described in two comments above (unsaved modifications and unmerged branches). > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968997#comment-14968997 ] Tomek Rękawek commented on OAK-2106: There's also a second problem with the local changes, pointed by [~catholicon] (thanks!): 1. Last change on /a/b is done at 12:00 2. Safe time for secondaries is 12:05 3. /a/b/c is updated by the local node at 12:10 4. The background update, which updates lastRev on /a/b and ancestors is delayed and runs at 12:15. 5. At 12:12 we want to get /a/b/c. The lastRev on /a/b is still 12:00 (<12:05 safe time), so we use a secondary instance and get an old version. In case of the remote modification this problem doesn't exist, as the background read uses lastRev at / to check if something has changed (so it'll pull changes after the background update finishes its work). > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968927#comment-14968927 ] Tomek Rękawek commented on OAK-2106: I'll continue work on this. Current state can be found on [my github|https://github.com/trekawek/jackrabbit-oak/tree/OAK-2106]. I've finished a [draft of the replication lag estimator|https://github.com/trekawek/jackrabbit-oak/blob/OAK-2106/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/ReplicationLagEstimator.java], now we should decide in which cases it can be used. I asked [~catholicon] about the "trickier" case when the change is done locally. He meant following situation: 1. There's a branch. 2. We want to read the document xyz, belonging to this branch, modified in 12:10 3. xyz can be found in cache, with the modification date 12:00 (as the cache doesn't reflect branch changes). 4. Safe time for secondaries is 12:05 (>12:00). 5. We read the document from secondary and get an old version. So, basically we shouldn't ask the secondary instance for a document belonging to a branch. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614399#comment-14614399 ] Vikas Saurabh commented on OAK-2106: [~mreutegg], I might be missing something, but bq. It might be better to only make those external changes visible when they appear on a nearer secondary. Subsequent reads could then be performed on that secondary as well. seems to imply that {{nearest}} read preference would always pick the same secondary -- which I don't think is true. So, I think reading simply reading external rev with {{nearest}} secondary might not cut it. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > Fix For: 1.3.5 > > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614396#comment-14614396 ] Vikas Saurabh commented on OAK-2106: I'm working on a patch for this issue. There was a bit of discussion around this here \[0] which implied that we'd poll to read replica set status. Let me try to explain what I am planning to do for calculating safeReplicatedTime (defined as: each member in the replica set is known to have data up to this time): # Poll PRIMARY for replica set status # Iterate optime for each member with SECONDARY status and get minimum timestamp # If any of the member has and error status (6, >=8) \[0], then break out of the loop and mark that it’s unsafe to reach out to a secondary #* This probably requires a bit of explanation – since we are polling, we might calculate a safeTime (T1) at which time a replica (S1) was DOWN. Now, by the time we poll next, S1 can come back up and start syncing itself from PRIMARY. BUT, it’s applied optime might remain less than T1. So, it would break the premise that all replicas have updates till T1 This part keeps happening asynchronously. #* Assumtion: If current arbiter (state=7) joins replica set as secondary, then by the time it reaches SECONDARY status, it'd have data at least up to this point. Now, the logic to read remains fairly same as it happens today # Get parent doc from cache (this doc is guaranteed to be up-to-date till last backgrounRead) There are 2 cases when parent is available in cache #* Parent was last updated by me (same node) after background read – this is fairly safe as the doc in cache would also have got updated #* Parent was last updated by some other node after background read – since, our visibility of other nodes is pinned by root._lastRev read during background read, so this cache state is also valid for any revision that we’d need to use # If cached parent doc has not been modified after safeReplicatedTime, then it should be safe to reach out to nearest replica to pull the child document. Assumptions: * I’m assuming though that PRIMARY of a replica set is calculated by majoriy voting. So, PRIMARY-ness of a replica is a universal state despite any topology parititioning. * Also, I’m assuming that _modified of parent (complete hierarchy) is updated in sync with any change in document – but there’s a comment in current code: {code} // FIXME: this is not quite accurate, because ancestors // are updated in a background thread (_lastRev). We // will need to revise this for low maxReplicationLagMillis // values {code} I’m not completely sure of that flow – but, in that case, we can then calculate {{reallySafeReplicatedTime = min(lastBackgroundReadTime, safeReplicatedTime)}} before step(2) of document reading logic. BUT, I’m not sure of how to make lastBackgroundReadTime available to MongoDocStore. I’m sure there’d be some gaps/holes here. So, let’s try to break it :). (cc [~mreutegg], [~chetanm], [~nleite]) \[0]: http://markmail.org/thread/5zjccjrg4fwz32qb \[1]: http://docs.mongodb.org/manual/reference/replica-states/ > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger >Assignee: Marcel Reutegger > Labels: performance, scalability > Fix For: 1.3.5 > > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 6 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 6 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160152#comment-14160152 ] Marcel Reutegger commented on OAK-2106: --- It would also be good to better align reads from the other cluster nodes with what is available on a nearby secondary. Right now, we background read the root document from the primary to find out what other cluster nodes had written. This way, further reads may need to be directed to the primary as well because those changes have not yet been replicated to the secondaries. It might be better to only make those external changes visible when they appear on a nearer secondary. Subsequent reads could then be performed on that secondary as well. > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 24 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 24 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-2106) Optimize reads from secondaries
[ https://issues.apache.org/jira/browse/OAK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138662#comment-14138662 ] Chetan Mehrotra commented on OAK-2106: -- OAK-1750 is somewhat related to that and proposes an approach we can take to determine the replication lag in a more accurate way > Optimize reads from secondaries > --- > > Key: OAK-2106 > URL: https://issues.apache.org/jira/browse/OAK-2106 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, mongomk >Reporter: Marcel Reutegger > > OAK-1645 introduced support for reads from secondaries under certain > conditions. The current implementation checks the _lastRev on a potentially > cached parent document and reads from a secondary if it has not been > modified in the last 24 hours. This timespan is somewhat arbitrary but > reflects the assumption that the replication lag of a secondary shouldn't > be more than 24 hours. > This logic should be optimized to take the actual replication lag into > account. MongoDB provides information about the replication lag with > the command rs.status(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)