[ 
https://issues.apache.org/jira/browse/OAK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991349#comment-14991349
 ] 

Ian Boston commented on OAK-3547:
---------------------------------

[~mreutegg] If an earlier version of the index is used by the writer, there 
will he holes in the index and items will be missing. There are several 
options. a) flag the issue to alert admins the index is not healthy, but 
continue to index using an index that will open. b) Fail the index write and 
stop indexing completely. c) Fail the index write and start re-indexing 
automatically.  Of those I think option a will deliver the best continuity. 
Option b risks wide scale application level issues, option c risks both 
application level issues and potential unavailability caused by the load or 
rebuilding an index from scratch. There is no easy answer. 

Now that there are checksums in place I have been seeing more frequent race 
conditions between the writer and the readers which occasionally open older 
versions. I think this is because the OakDirectory checks all the files when 
its opened by computing a checksum of everything referenced. I think that 
Lucene delays checking the file or checking the internals of a file until its 
needed, hence any errors are more visible than before.

----

Lucene already has a concept of committing the index by syncing the segment_xx 
and segment.gen files. I am writing the listing node on sync of either of these 
or close of the index which has reduced the number of generations. The result 
appears to be very stable. I have also introduced the concept of mutability as 
some of the file types are mutable. .del is mutable, so the length and checksum 
are not checked. If a .del from a later generation is used, that will only 
delete the lucene docs that were deleted in that later generation. No damage. 
segments.gen is also mutable. This is more of a problem. It is supposed to be a 
fallback file with segment_xx used in preference, however if segment.gen is 
used it will be from the wrong generation and will define the wrong set of 
segment files for the index. I need to check if segment.gen is ever read. If it 
is, then I think the OakDirectory needs to map segment.gen to a generational 
version of the same (ie segment.gen_<epoch>) so that only .del files are 
mutable. That should make the OakDirectory recoverable.






> Improve ability of the OakDirectory to recover from unexpected file errors
> --------------------------------------------------------------------------
>
>                 Key: OAK-3547
>                 URL: https://issues.apache.org/jira/browse/OAK-3547
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>    Affects Versions: 1.4
>            Reporter: Ian Boston
>
> Currently if the OakDirectory finds that a file is missing or in some way 
> damaged, and exception is thrown which impacts all queries using that index, 
> at times making the index unavailable. This improvement aims to make the 
> OakDirectory recover to a previously ok state by storing which files were 
> involved in previous states, and giving the code some way of checking if they 
> are valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to