[ 
https://issues.apache.org/jira/browse/SOLR-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277135#comment-15277135
 ] 

Hrishikesh Gadre commented on SOLR-9091:
----------------------------------------

[~thetaphi] Thanks for the pointer.

bq. Every segment in an index also had a unique identifier written into the 
file. So you can compare both, the checksum for correctness of file (not 
modified) and the uuid to validate if it is really the same segment

As per my understanding validating the checksum is necessary and sufficient to 
ensure the integrity of the restored index state. Per segment unique identifier 
is an optimization when we *know* that the two copies of index state are linked 
to each other (e.g. in case of replication the "slave" is bootstrapped by a 
"master") and hence avoiding copy of common segments does not result in any 
index integrity issues.

On the other hand it is a bit risky to assume that "backup" and "current" index 
state are always related to each other. e.g. consider the use-case I mentioned 
above,
-> User backs up index files for Core A
-> User creates a new core (Core B) and index DIFFERENT documents.
-> User runs restore operation of core B using previously created BACKUP (Of 
Core A)

Is it possible that the segment identifiers generated in core B may have an 
overlap with those in core A ?


> Solr index restore silently copies the corrupt segments in the backup
> ---------------------------------------------------------------------
>
>                 Key: SOLR-9091
>                 URL: https://issues.apache.org/jira/browse/SOLR-9091
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hrishikesh Gadre
>
> The Solr core restore functionality uses following criteria to decide if a 
> given file is copied from backup directory or from current index directory.
> case 1] File is available in both backup and current index directory
> --> Compare the checksum and file length
>   --> If checksum and length matching, copy the file from current working 
> directory.
>  --> If the checksum and length doesn't match, copy the file from backup 
> directory. 
> case 2] File is available in only in backup directory (This can happen for a 
> newly created core without any data).
> --> Copy the file from backup directory. 
> Now the problem here is that we intentionally catch and ignore the error 
> while reading the checksum for a file in the backup directory. Hence in case 
> (2), it will result into restoration of a file without appropriate "checksum".
> Here is the relevant code snippet,
> https://github.com/apache/lucene-solr/blob/a5586d29b23f7d032e6d8f0cf8758e56b09e0208/solr/core/src/java/org/apache/solr/handler/RestoreCore.java#L82-L95



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to