[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952844#comment-16952844 ] Thomas Mueller commented on OAK-7947: - Let's wait with backport until we have analyzed the issue and looked at alternatives. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12.0 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952589#comment-16952589 ] Julian Reschke commented on OAK-7947: - If we wanted to backport this to 1.10, we could: 1) backout the changes for OAK-8437 and OAK-8046 2) revert r1851052 3) merge r1851022 and r1852007 4) backport again OAK-8046 and OAK-8437 > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12.0 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830210#comment-16830210 ] Julian Reschke commented on OAK-7947: - trunk: (1.12.0) [r1852007|http://svn.apache.org/r1852007] [r1851022|http://svn.apache.org/r1851022] [r1850826|http://svn.apache.org/r1850826] [r1850231|http://svn.apache.org/r1850231] [r1850229|http://svn.apache.org/r1850229] [r1850163|http://svn.apache.org/r1850163] [r1849465|http://svn.apache.org/r1849465] 1.10: (1.10.0) [r1851052|http://svn.apache.org/r1851052] (1.10.0) [r1850826|http://svn.apache.org/r1850826] [r1850231|http://svn.apache.org/r1850231] [r1850229|http://svn.apache.org/r1850229] [r1850163|http://svn.apache.org/r1850163] [r1849465|http://svn.apache.org/r1849465] > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12.0 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760701#comment-16760701 ] Thomas Mueller commented on OAK-7947: - > Would it be feasible to add a way for users to trigger downloading of indexes? [~mduerig] yes, we can add something like this. One idea is to load all indexes except those marked as deprecated, or we can add a new flag (e.g. "lazyLoad"). I suggest we wait with this until lazy loading works as expected, so that we are sure it works as expected. (If we add such a feature very early on, there is a risk that lazy loading isn't well tested on real problems, as it's rare. I'm not suggesting we don't add unit tests, but here it's a bit hard to come up with unit tests that match reality.) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760709#comment-16760709 ] Michael Dürig commented on OAK-7947: {quote}I suggest we wait with this {quote} Ack. I'll bring it up again if and once required. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750952#comment-16750952 ] Michael Dürig commented on OAK-7947: Would it be feasible to add a way for users to trigger downloading of indexes? This could be used to e.g. start downloading indexes in the background or for pre-warming instances before switching them live. Arguably this is a topic for a separate issue and lets follow up in a one if this is feasible at all. If not, lets forget it. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750947#comment-16750947 ] Tommaso Teofili commented on OAK-7947: -- +1, thanks Thomas, I think it sounds like the most reasonable compromise for the current situation. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750944#comment-16750944 ] Thomas Mueller commented on OAK-7947: - http://svn.apache.org/r1852007 (trunk) includes the LuceneIndexMBeanImpl patch above (so, index update doesn't download the indexes to get stats, except if the system property is set). > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750890#comment-16750890 ] Thomas Mueller commented on OAK-7947: - The following addition doesn't download the indexes (only updates the stats for the indexes that are already downloaded, that is, only for those that are shown in the JMX bean table). Maybe we could have some "middle ground", that is, by default download the indexes during the index upgrade cycle, but only those that aren't deprecated. That way, index update only doesn't cause large deprecated indexes to be downloaded. For non-deprecated indexes, I think it's actually good to download them quite early on, and the index update mechanism sounds like a good mechanism for that. {noformat} --- src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexMBeanImpl.java (revision 1851902) +++ src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexMBeanImpl.java (working copy) @@ -93,6 +93,8 @@ public class LuceneIndexMBeanImpl extends AnnotatedStandardMBean implements LuceneIndexMBean { +private static final boolean LOAD_INDEX_FOR_STATS = Boolean.parseBoolean(System.getProperty("oak.lucene.LoadIndexForStats", "false")); + private final Logger log = LoggerFactory.getLogger(getClass()); private final IndexTracker indexTracker; private final NodeStore nodeStore; @@ -381,11 +383,21 @@ @Override public String getSize(String indexPath) throws IOException { +if (!LOAD_INDEX_FOR_STATS) { +if (!indexTracker.getIndexNodePaths().contains(indexPath)) { +return "-1"; +} +} return String.valueOf(getIndexStats(indexPath).indexSize); } @Override public String getDocCount(String indexPath) throws IOException { +if (!LOAD_INDEX_FOR_STATS) { +if (!indexTracker.getIndexNodePaths().contains(indexPath)) { +return "-1"; +} +} return String.valueOf(getIndexStats(indexPath).numDocs); } {noformat} > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750871#comment-16750871 ] Thomas Mueller commented on OAK-7947: - [~teofili] [~catholicon] maybe it's alright if the async index update downloads all the index files (even if the index wasn't updated or used so far), what do you think? What about adding a system property so behavior (basically OAK-7893) this can be disabled? If I do that and set the system property, then at startup only the indexes that I would expect are downloaded. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750849#comment-16750849 ] Thomas Mueller commented on OAK-7947: - Index copying takes place here: {noformat} at org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.(CopyOnReadDirectory.java:83) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier.wrapForRead(IndexCopier.java:124) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReader(DefaultIndexReaderFactory.java:97) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReader(DefaultIndexReaderFactory.java:85) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createMountedReaders(DefaultIndexReaderFactory.java:67) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReaders(DefaultIndexReaderFactory.java:60) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexNodeManager.open(LuceneIndexNodeManager.java:72) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:243) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:212) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexMBeanImpl.getIndexStats(LuceneIndexMBeanImpl.java:143) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexMBeanImpl.getDocCount(LuceneIndexMBeanImpl.java:389) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexStatsUpdateCallback.done(LuceneIndexStatsUpdateCallback.java:64) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.search.CompositePropertyUpdateCallback.done(CompositePropertyUpdateCallback.java:53) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:157) [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:397) [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT] at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:59) [org.apache.jackrabbit.oak-store-spi:1.9.10.R1845889] at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:55) [org.apache.jackrabbit.oak-store-spi:1.9.10.R1845889] at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:728) [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:573) [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT] at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:432) [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT] at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:347) [org.apache.sling.commons.scheduler:2.7.2] at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [org.apache.sling.commons.scheduler:2.7.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It looks like this is relatively new code, due to OAK-7893. Calculating index statistics right now causes indexes to be downloaded. This happens for every Lucene index, at every index update (whether or not a specific index was changed). > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > >
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750816#comment-16750816 ] Thomas Mueller commented on OAK-7947: - New patch OAK-7947_v5.patch passes the tests... But unfortunately it doesn't seem to lazily load the indexes I would expect. Need to further analyze. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742827#comment-16742827 ] Thomas Mueller commented on OAK-7947: - Reverted by [~teofili] on Friday, 2019-01-11, in http://svn.apache.org/r1851022 (trunk) http://svn.apache.org/r1851052 (1.10 branch) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.12, 1.11.0 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740296#comment-16740296 ] Julian Reschke commented on OAK-7947: - trunk: [r1851022|http://svn.apache.org/r1851022] [r1850826|http://svn.apache.org/r1850826] [r1850231|http://svn.apache.org/r1850231] [r1850229|http://svn.apache.org/r1850229] [r1850163|http://svn.apache.org/r1850163] [r1849465|http://svn.apache.org/r1849465] 1.10: [r1850826|http://svn.apache.org/r1850826] [r1850231|http://svn.apache.org/r1850231] [r1850229|http://svn.apache.org/r1850229] [r1850163|http://svn.apache.org/r1850163] [r1849465|http://svn.apache.org/r1849465] > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.9.14 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737972#comment-16737972 ] Thomas Mueller commented on OAK-7947: - http://svn.apache.org/r1850826 (bugfix) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.9.14 > > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733008#comment-16733008 ] Thomas Mueller commented on OAK-7947: - http://svn.apache.org/r1850231 (trunk; related changes) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732988#comment-16732988 ] Thomas Mueller commented on OAK-7947: - http://svn.apache.org/r1850229 (trunk) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732950#comment-16732950 ] Thomas Mueller commented on OAK-7947: - [~catholicon] could you review OAK-7947_v4.patch please? I added a feature flag to disable lazy loading. If you don't have time to review right now, not a problem (I think it's fine to commit it before the review). > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > OAK-7947_v4.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732926#comment-16732926 ] Thomas Mueller commented on OAK-7947: - It looks like tracker.acquireIndexNode not only acquires the index node, but also puts it into the index tracker "indices" map. To make the LucenePropertyIndexTest.reindexWithCOWWithoutIndexPath test pass, there are two options: * either change the query so the nodetype index can't be used, for example to "select * from [mix:title] where [jcr:title] = 'x'", or * don't use LazyLuceneIndexNode (or call getIndexNode() in its constructor) To make the SynchronousPropertyIndexTest tests pass: * either (in the tests) call runAsyncIndex() after creating the index definitions, or * don't use LazyLuceneIndexNode (or call getIndexNode() in its constructor) So the patch does change behavior: the index will only be available if the indexing cycle is run. I think that's acceptable, so changing the tests is fine I think. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732870#comment-16732870 ] Thomas Mueller commented on OAK-7947: - Thanks [~reschke]! And I thought I ran the tests... > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731936#comment-16731936 ] Julian Reschke commented on OAK-7947: - trunk: [r1850163|http://svn.apache.org/r1850163] [r1849465|http://svn.apache.org/r1849465] > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731912#comment-16731912 ] Julian Reschke commented on OAK-7947: - Test failure: {noformat} [ERROR] Failures: [ERROR] LucenePropertyIndexTest.reindexWithCOWWithoutIndexPath:2497 [ERROR] SynchronousPropertyIndexTest.nodeTypeIndexing:445 Expected: a string containing "/oak:index/foo" but: was "[oak:TestSuperType] as [oak:TestSuperType] /* nodeType Filter(query=explain select * from [oak:TestSuperType], path=*) */" [ERROR] SynchronousPropertyIndexTest.nodeType_mixins:465 Expected: a string containing "/oak:index/foo" but: was "[oak:TestMixA] as [oak:TestMixA] /* nodeType Filter(query=explain select * from [oak:TestMixA], path=*) */" [ERROR] SynchronousPropertyIndexTest.nonRootIndex:369->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323 Expected path /content/a not found, got [] [ERROR] SynchronousPropertyIndexTest.nonUniqueIndex:271->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323 Expected path /a not found, got [] [ERROR] SynchronousPropertyIndexTest.queryPlan:330 Expected: a string containing "sync:(foo[jcr:content/foo] bar)" but: was "[nt:base] as [nt:base] /* no-index where [nt:base].[jcr:content/foo] = 'bar' */" [ERROR] SynchronousPropertyIndexTest.relativePropertyTransform:349->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323 Expected path /a not found, got [] [INFO] [ERROR] Tests run: 858, Failures: 7, Errors: 0, Skipped: 19 {noformat} Reverting change for now. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726702#comment-16726702 ] Thomas Mueller commented on OAK-7947: - The minimal set of changes (just IndexTracker and LucenePropertyIndex) are committed: http://svn.apache.org/r1849465 so those should make it into Oak 1.9.14. Other changes to follow next year. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725500#comment-16725500 ] Vikas Saurabh commented on OAK-7947: [~tmueller], v3 looks fine to me. Adding a few comment though: * I think we should get v2::point3 and point4 (log in oak streaming index file and avoid getNumDocs in case there's an entry count set) * {quote}// already released{quote} should we add a warn here - I don't think multiple release calls are expected * {quote}// ...I don't think this is ever called concurrently{quote} I agree that methods on this would not be called concurrently. So, can we possibly simplify locking and simply add {{synchronized}} to the method itself? > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725114#comment-16725114 ] Thomas Mueller commented on OAK-7947: - [OAK-7947_v3.patch|https://issues.apache.org/jira/secure/attachment/12952368/OAK-7947_v3.patch] contains only the really required changes. [~catholicon] could you please review it? I will then commit it. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725071#comment-16725071 ] Thomas Mueller commented on OAK-7947: - I will not fix the TODOs in the patch, mainly add synchronization, and will then verify that code coverage is fine. Not sure if it's easy to add a unit test; an integration test is probably simpler. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725067#comment-16725067 ] Thomas Mueller commented on OAK-7947: - Thanks [~catholicon]! I attached a new patch, [OAK-7947_v2.patch|https://issues.apache.org/jira/secure/attachment/12952362/OAK-7947_v2.patch] that contains the changes that are needed and make sense: * IndexTracker.java: now checks that there is a child node ":index-definition". So for a new index, it should now not return the definition. * LucenePropertyIndex: returns LazyLuceneIndexNode instead of LuceneIndexNode. This is needed, otherwise acquireIndexNode() is called even if getIndexNode isn't called. (And acquireIndexNode downloads the index binaries.) * OakStreamingIndexFile: A simple change to log the directory name as well. (Not strictly needed, but very useful). * FulltextIndexPlanner: Only call getNumDocs() if the index definition doesn't contain a property "entryCount". (Not strictly needed, but should reduce reads). > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, OAK-7947_v2.patch, > lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724965#comment-16724965 ] Vikas Saurabh commented on OAK-7947: Attached a zip - [^lucene-index-open-access.zip] which contains: * logging-directory.patch - a patch that adds a logging {{Directory}} implementation * open-close-dir-calls.txt - all calls that the patch logged for a 11G damAssetLucene index (same one I listed above) * open-dir-calls.txt - all calls to simply open the index * close-dir-calls.txt - calls to close the index I few things that were quite interesting: * *All* index files were read although mostly only a few reads were incurred * seek were only incurred on {{.tim}}, {{.tip}} and {{.cfs}} files - {{.cfs}} files tended to be in 100MB range * seeks in {{.tim}} and {{.cfs}} went backwards too - so they could require opening input stream multiple times * only a few reads occur even after a seek (there could be other useful patterns to find as well) > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch, lucene-index-open-access.zip > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716524#comment-16716524 ] Thomas Mueller commented on OAK-7947: - > The changes in ... getIndexDefinition ... not from stored index definition Yes, I know, this is a bug in the patch. I will fix that. > the patch you had attached seems quite risky to me Yes. I didn't plan to apply the patch, it's just the starting point. There are bugs, todos, and some parts are probably not needed. Next, I will try to find out which parts are not needed. > let index open happen as it happens today but copy required files right away > (synchronously) and schedule rest of the files for later. I'm afraid I would need some help for this. I tried disabling copy-on-read, but then the file are opened from the datastore, which has some additional problems: files are opened multiple times. So I came to the conclusion it's best not to open the files until they are really needed to run queries, and needed to do detailed cost estimation (if the index might be used). So there are 3 stages (AFAIK): * Stage 1: just the index definition is needed so see if the properties are indexed. * Stage 2: numDocs are needed to do cost estimation. * Stage 3: index is used for a query. Obviously, for stage 3, the index files are needed. For stage 1, right now the index files are opened. I think it's sufficient to delay opening the files there, and just use the index definition. For stage 2, I think (not sure yet) that this is actually rare enough and it's OK to open all index files. If it turns out this is _not_ that rare, then we can store the numDocs in the index definition from time to time (in theory we could do that for every index update). Then store the time of the numDocs update. And when the numDocs are needed, then either they are read from the index definition (let's say if they are younger than 1 hour or so), or else open the index files. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714647#comment-16714647 ] Vikas Saurabh commented on OAK-7947: [~tmueller], the patch you had attached seems quite risky to me (as it touches quite a lot of places) and would solve "avoid opening index as long as possible wrt index definitions". If index definition can potentially answer the query and we want to open index to say get num docs or num docs per field then we would still copy in all index files. I've a at least one comment on the patch which I'd note at the end. Maybe we could try a different approach - let index open happen as it happens today but copy required files right away (synchronously) and schedule rest of the files for later. Here's a snip of size sorted list of files from a 11G {{damAssetLucene}} index that [~chibulcu] had provided me from an AEM isntance: {noformat} $ ls -lsSh total 11G 4.5G -rw-r--r-- 1 vsaurabh vsaurabh 4.5G Nov 23 12:20 _101z.fdt 4.5G -rw-r--r-- 1 vsaurabh vsaurabh 4.5G Nov 23 13:43 _1zt4.fdt 580M -rw-r--r-- 1 vsaurabh vsaurabh 580M Nov 23 12:20 _101z.pos 579M -rw-r--r-- 1 vsaurabh vsaurabh 579M Nov 23 13:43 _1zt4.pos 177M -rw-r--r-- 1 vsaurabh vsaurabh 177M Nov 23 13:44 _20z0.cfs 106M -rw-r--r-- 1 vsaurabh vsaurabh 106M Nov 23 12:20 _1x4o.cfs 65M -rw-r--r-- 1 vsaurabh vsaurabh 65M Nov 23 13:44 _20bb.cfs 29M -rw-r--r-- 1 vsaurabh vsaurabh 29M Nov 23 13:44 _217z.cfs 16M -rw-r--r-- 1 vsaurabh vsaurabh 16M Nov 23 12:10 _101z.doc 16M -rw-r--r-- 1 vsaurabh vsaurabh 16M Nov 23 12:20 _1zt4.doc 6.7M -rw-r--r-- 1 vsaurabh vsaurabh 6.7M Nov 23 13:44 _21ef.cfs 6.5M -rw-r--r-- 1 vsaurabh vsaurabh 6.5M Nov 23 13:44 _216f.cfs 6.3M -rw-r--r-- 1 vsaurabh vsaurabh 6.3M Nov 23 12:20 _101z.tim 5.9M -rw-r--r-- 1 vsaurabh vsaurabh 5.9M Nov 23 13:43 _1zt4.tim 5.9M -rw-r--r-- 1 vsaurabh vsaurabh 5.9M Nov 23 13:44 _21cy.cfs 4.4M -rw-r--r-- 1 vsaurabh vsaurabh 4.4M Nov 23 13:44 _21ab.cfs 3.8M -rw-r--r-- 1 vsaurabh vsaurabh 3.8M Nov 23 13:44 _21e4.cfs 3.7M -rw-r--r-- 1 vsaurabh vsaurabh 3.7M Nov 23 13:44 _21du.cfs 3.0M -rw-r--r-- 1 vsaurabh vsaurabh 3.0M Nov 23 13:44 _21dk.cfs 2.6M -rw-r--r-- 1 vsaurabh vsaurabh 2.6M Nov 23 13:44 _21f1.cfs 648K -rw-r--r-- 1 vsaurabh vsaurabh 647K Nov 23 12:10 _101z.dvd 424K -rw-r--r-- 1 vsaurabh vsaurabh 421K Nov 23 12:20 _1zt4.dvd 380K -rw-r--r-- 1 vsaurabh vsaurabh 378K Nov 23 12:20 _101z.fdx 372K -rw-r--r-- 1 vsaurabh vsaurabh 369K Nov 23 13:43 _1zt4.fdx 120K -rw-r--r-- 1 vsaurabh vsaurabh 120K Nov 23 13:44 _21f7.cfs 120K -rw-r--r-- 1 vsaurabh vsaurabh 120K Nov 23 13:44 _21f4.cfs {noformat} Looking at https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/codecs/lucene46/package-summary.html, {{fdt}} files are stored field data and {{pos}} is positional data for indexed terms. Both these shouldn't get loaded only for cost evaluation afaict (we should probably try to confirm this btw). These 2 form the biggest chunk of the files - so, maybe only avoiding these to get copied over just to open an index would save us a lot of time for first time index open. Additionally, I think this approach is much less risky imo. _patch review_ The changes in {noformat} public LuceneIndexDefinition getIndexDefinition(String indexPath){ {noformat} when index isn't in index map is providing a definition which is visible in tree and not from stored index definition that gets stored. This would change the behavior of planner to start to use un-indexed index definition as well. Afaics, the other changes are essentially doing lazy init and won't affect behavior afaics - but it does make it a little brittle to control to avoid index open (an unrelated part of code might start call some part that would in turn happily open the index). > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712595#comment-16712595 ] Chetan Mehrotra commented on OAK-7947: -- [~tmueller] One reason for doing eager loading was to avoid contention in queries hitting at very start. To make is lazy what we can do is store the data points required for index planning in index data node itself in repository. So stuff like numDocs and field count etc can recorded in repo upon index close. Then at least for index planning phase we need not open the IndexWriter at all > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup
[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711507#comment-16711507 ] Thomas Mueller commented on OAK-7947: - The attached solves the issue. It contains various changes, possibly some of them are not needed, and some might be incorrect / problematic. This is work-in-progress. Still it would be nice to get some feedback from those who are more familiar with this code, for example [~catholicon] [~teofili] [~chetanm]. Changes I did: * IndexTracker.getIndexDefinition constructs the node and returns it if the index isn't in the indices map yet. I don't know why it returned null before, it seems wrong to me. * LuceneIndexNodeManager always opened the index, I don't know why. SearcherHolder now doesn't always do that. I basically make SearcherHolder open the index lazily. * LucenePropertyIndex acquireIndexNode is called when planning, and that method opens the index files. I don't know why. I created a class LazyLuceneIndexNode that wraps LuceneIndexNode and creates it lazily. * OakStreamingIndexFile now logs the directory name as well, not just the file name. * DefaultIndexReader now opens the directory (DirectoryReader.open) lazily; only when calling getReader. * FulltextIndexPlanner.estimatedEntryCount now only calls getNumDocs when really needed (that is, only if "entryCount" isn't set in the index definition). That should avoid having to open the index if we know the entryCount is high. > Lazy loading of Lucene index files startup > -- > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: OAK-7947.patch > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)