[ 
https://issues.apache.org/jira/browse/SOLR-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264461#comment-15264461
 ] 

David Smiley commented on SOLR-9038:
------------------------------------

bq. If we are going to allow the "backup" operation to use this snapshot commit 
in future, then I think we need to make sure that that snapshot commit is 
preserved during collection configuration changes. If the snapshot commit is 
created on all replicas for a shard, then it probably is OK to delete one or 
more replicas. But I am not sure how would we handle the case when a shard 
containing a one or more snapshot commits is deleted.

There's no issue, I think, if a replica is deleted.  If a whole shard is 
deleted, then I think it's okay too -- it won't be backed up -- there's nothing 
left :-)

bq. I agree that requiring replicas to transfer snapshot commits during 
recovery may not be a good idea since in case of large collections it will 
increase the size of data transferred over the network.

I don't think it's a blocker to the approach... it's just the price one pays to 
recover in the presence of snapshot commits.  Other improvements around how 
Lucene segments merge might make more sense to optimize this such that segments 
can only be merged if the IndexCommits pointing to them are consistent.  If 
this idea were implemented, and If one were to do an optimize (as a 
hypothetical example to explain the effect), they would have a segment for each 
snapshot commit, with disjoint documents (no duplication).  Pretty good, I 
think.  But this would clearly be it's own issue :-)

> Ability to create/delete/list snapshots for a solr collection
> -------------------------------------------------------------
>
>                 Key: SOLR-9038
>                 URL: https://issues.apache.org/jira/browse/SOLR-9038
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Hrishikesh Gadre
>
> Currently work is under-way to implement backup/restore API for Solr cloud 
> (SOLR-5750). SOLR-5750 is about providing an ability to "copy" index files 
> and collection metadata to a configurable location. 
> In addition to this, we should also provide a facility to create "named" 
> snapshots for Solr collection. Here by "snapshot" I mean configuring the 
> underlying Lucene IndexDeletionPolicy to not delete a specific commit point 
> (e.g. using PersistentSnapshotIndexDeletionPolicy). This should not be 
> confused with SOLR-5340 which implements core level "backup" functionality.
> The primary motivation of this feature is to decouple recording/preserving a 
> known consistent state of a collection from actually "copying" the relevant 
> files to a physically separate location. This decoupling have number of 
> advantages
> - We can use specialized data-copying tools for transferring Solr index 
> files. e.g. in Hadoop environment, typically 
> [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html] tool is used to 
> copy files from one location to other. This tool provides various options to 
> configure degree of parallelism, bandwidth usage as well as integration with 
> different types and versions of file systems (e.g. AWS S3, Azure Blob store 
> etc.)
> - This separation of concern would also help Solr to focus on the key 
> functionality (i.e. querying and indexing) while delegating the copy 
> operation to the tools built for that purpose.
> - Users can decide if/when to copy the data files as against creating a 
> snapshot. e.g. a user may want to create a snapshot of a collection before 
> making an experimental change (e.g. updating/deleting docs, schema change 
> etc.). If the experiment is successful, he can delete the snapshot (without 
> having to copy the files). If the experiment is failed, then he can copy the 
> files associated with the snapshot and restore.
> Note that Apache Blur project is also providing a similar feature 
> [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to