[ https://issues.apache.org/jira/browse/OAK-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388524#comment-17388524 ]
Amrit Verma commented on OAK-9434: ---------------------------------- Configurations added - *Sort strategy type* - [https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileNodeStoreBuilder.java#L53]. | Example test - [https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/test/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreTest.java#L102] *Thread pool size for parallel download* - [https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/MultithreadedTraverseWithSortStrategy.java#L326] *Existing data dump dir (to resume from where previous download stopped)* - [https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/IndexOptions.java#L106-L108] - This option, if specified, should point to the flat file store directory in the indexing work dir - See example test case - [https://github.com/apache/jackrabbit-oak/blob/1621b9d56434ee4a6f2cd19863f94d963d68ac91/oak-run-commons/src/test/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreTest.java#L175] > MongoDB indexing: implement parallel chunk download > --------------------------------------------------- > > Key: OAK-9434 > URL: https://issues.apache.org/jira/browse/OAK-9434 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing > Affects Versions: 1.38.0 > Reporter: Amrit Verma > Assignee: Amrit Verma > Priority: Major > > In case of large indexes, indexing takes a long time. In case of MongoDB > Document store, Currently it is a two step process - download the data from > mongodb then create index based on that data. > If something fails during this process, indexing needs to be restarted from > beginning of the download step. We should make the indexing process resumable > from the point it stopped. > Since data download from mongodb seems to be more time taking than indexing > itself, we first focus on download part. > This Jira issue is for implementing resumable/parallel download. -- This message was sent by Atlassian Jira (v8.3.4#803005)