[ https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716524#comment-16716524 ]
Thomas Mueller commented on OAK-7947: ------------------------------------- > The changes in ... getIndexDefinition ... not from stored index definition Yes, I know, this is a bug in the patch. I will fix that. > the patch you had attached seems quite risky to me Yes. I didn't plan to apply the patch, it's just the starting point. There are bugs, todos, and some parts are probably not needed. Next, I will try to find out which parts are not needed. > let index open happen as it happens today but copy required files right away > (synchronously) and schedule rest of the files for later. I'm afraid I would need some help for this. I tried disabling copy-on-read, but then the file are opened from the datastore, which has some additional problems: files are opened multiple times. So I came to the conclusion it's best not to open the files until they are really needed to run queries, and needed to do detailed cost estimation (if the index might be used). So there are 3 stages (AFAIK): * Stage 1: just the index definition is needed so see if the properties are indexed. * Stage 2: numDocs are needed to do cost estimation. * Stage 3: index is used for a query. Obviously, for stage 3, the index files are needed. For stage 1, right now the index files are opened. I think it's sufficient to delay opening the files there, and just use the index definition. For stage 2, I think (not sure yet) that this is actually rare enough and it's OK to open all index files. If it turns out this is _not_ that rare, then we can store the numDocs in the index definition from time to time (in theory we could do that for every index update). Then store the time of the numDocs update. And when the numDocs are needed, then either they are read from the index definition (let's say if they are younger than 1 hour or so), or else open the index files. > Lazy loading of Lucene index files startup > ------------------------------------------ > > Key: OAK-7947 > URL: https://issues.apache.org/jira/browse/OAK-7947 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene, query > Reporter: Thomas Mueller > Assignee: Thomas Mueller > Priority: Major > Attachments: OAK-7947.patch > > > Right now, all Lucene index binaries are loaded on startup (I think when the > first query is run, to do cost calculation). This is a performance problem if > the index files are large, and need to be downloaded from the data store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)