[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

Thomas Mueller (JIRA) Tue, 11 Dec 2018 00:24:36 -0800


    [ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716524#comment-16716524
 ]


Thomas Mueller commented on OAK-7947:
-------------------------------------

> The changes in ... getIndexDefinition ... not from stored index definition

Yes, I know, this is a bug in the patch. I will fix that.

> the patch you had attached seems quite risky to me

Yes. I didn't plan to apply the patch, it's just the starting point. There are 
bugs, todos, and some parts are probably not needed.

Next, I will try to find out which parts are not needed.

> let index open happen as it happens today but copy required files right away 
> (synchronously) and schedule rest of the files for later.

I'm afraid I would need some help for this. I tried disabling copy-on-read, but 
then the file are opened from the datastore, which has some additional 
problems: files are opened multiple times. So I came to the conclusion it's 
best not to open the files until they are really needed to run queries, and 
needed to do detailed cost estimation (if the index might be used). So there 
are 3 stages (AFAIK):

* Stage 1: just the index definition is needed so see if the properties are 
indexed.
* Stage 2: numDocs are needed to do cost estimation.
* Stage 3: index is used for a query.

Obviously, for stage 3, the index files are needed. For stage 1, right now the 
index files are opened. I think it's sufficient to delay opening the files 
there, and just use the index definition. For stage 2, I think (not sure yet) 
that this is actually rare enough and it's OK to open all index files. If it 
turns out this is _not_ that rare, then we can store the numDocs in the index 
definition from time to time (in theory we could do that for every index 
update). Then store the time of the numDocs update. And when the numDocs are 
needed, then either they are read from the index definition (let's say if they 
are younger than 1 hour or so), or else open the index files.



> Lazy loading of Lucene index files startup
> ------------------------------------------
>
>                 Key: OAK-7947
>                 URL: https://issues.apache.org/jira/browse/OAK-7947
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>         Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

Reply via email to