[ https://issues.apache.org/jira/browse/OAK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chetan Mehrotra updated OAK-3629: --------------------------------- Fix Version/s: (was: 1.3.12) 1.3.13 > Index corruption seen with CopyOnRead when index defnition is recreated > ----------------------------------------------------------------------- > > Key: OAK-3629 > URL: https://issues.apache.org/jira/browse/OAK-3629 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Priority: Minor > Fix For: 1.3.13 > > > CopyOnRead logic relies on {{reindexCount}} to determine the name of > directory in which index files would be copied. In normal flow if the index > is reindexed then this count would get increased and newer index files would > get copied to a new directory. > However if the index definition node gets recreated due to some deployment > process then this count gets reset to 0. Due to which newly created index > files from reindexing would start getting copied to already existing > directory and that can lead to corruption. > So what happened here was > # System started with index definition I1 and indexing got complete with > index files saved under index/hash(indexpath)/1 (where 1 is current reindex > count) > # A new index definition package was deployed which reset the index count. > Now reindex happened again and the CopyOnRead logic per current design reused > the existing index directory. And it so happens that Lucene create file with > same name and same size but different content. This trips the CopyOnRead > defense of length based index corruption check and thus cause new lucene > index to corrupt > *Note that here corruption is transient i.e. persisted index is not > corrupted*. Just that locally copied index gets corrupted. Cleaning up the > index directory would fix the issue and that can be used as a workaround. > *Fix* > After discussing with [~tmueller] following approach can be used. > Instead of relying on reindex count we can maintain a hidden randomly > generated uuid and store it in the index config. This would be used to derive > the name of directory on filesystem. If the index definition gets reset then > the uuid can be regenerated. > *Workaround* > Clean the directory used by CopyOnRead which is <repo home>/index before > restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)