[ https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nuno Santos resolved OAK-10778. ------------------------------- Fix Version/s: 1.64.0 Resolution: Done > Indexing job: support parallel download from MongoDB with two connections in > Pipelined strategy > ----------------------------------------------------------------------------------------------- > > Key: OAK-10778 > URL: https://issues.apache.org/jira/browse/OAK-10778 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing > Reporter: Nuno Santos > Priority: Major > Fix For: 1.64.0 > > > The current version of the Pipelined download strategy uses a single > connection/thread to download from MongoDB. We can further increase the > download speed by using an additional MongoDB connection. A Mongo deployment > has 1 primary and 2 secondaries, so in principle we could have 1 connection > to each secondary, effectively doubling the download speed. > There are a few points to observe: > - Connections should go to different secondaries. If both connections go to > the same secondary, there's a high change that they will be limited by what a > single replica can provide and of overloading that replica. So each secondary > should have one and only one connection. > - How to partition the range of documents to download between two threads. > We are already downloading from Mongo in order of {{(_modified, _id)}}. A > simple and effective partition strategy for 2 connections is for one to > download in ascending and the other in descending order. -- This message was sent by Atlassian Jira (v8.20.10#820010)