[ 
https://issues.apache.org/jira/browse/SOLR-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321669#comment-15321669
 ] 

Hrishikesh Gadre commented on SOLR-9038:
----------------------------------------

[~dsmiley] Thanks for the feedback.

The reason why I brought this up to make sure that this feature works nicely 
with the existing functionality (and requirements). The good point about 
integrating snapshot creation with commit work-flow is that the snapshot would 
always have the *latest* committed index state. 

Reviewing the current backup implementation, it looks like the backup logic 
does *not* commit anything. It just copies the *latest* commit to the backup 
directory. Obviously user can manually issue a *hard commit* before invoking 
backup operation or let the backup be created with the latest auto hard 
committed data. From Solr perspective these two are distinct operations. I 
wonder if there is a specific reason for this? 

> Ability to create/delete/list snapshots for a solr collection
> -------------------------------------------------------------
>
>                 Key: SOLR-9038
>                 URL: https://issues.apache.org/jira/browse/SOLR-9038
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Hrishikesh Gadre
>
> Currently work is under-way to implement backup/restore API for Solr cloud 
> (SOLR-5750). SOLR-5750 is about providing an ability to "copy" index files 
> and collection metadata to a configurable location. 
> In addition to this, we should also provide a facility to create "named" 
> snapshots for Solr collection. Here by "snapshot" I mean configuring the 
> underlying Lucene IndexDeletionPolicy to not delete a specific commit point 
> (e.g. using PersistentSnapshotIndexDeletionPolicy). This should not be 
> confused with SOLR-5340 which implements core level "backup" functionality.
> The primary motivation of this feature is to decouple recording/preserving a 
> known consistent state of a collection from actually "copying" the relevant 
> files to a physically separate location. This decoupling have number of 
> advantages
> - We can use specialized data-copying tools for transferring Solr index 
> files. e.g. in Hadoop environment, typically 
> [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html] tool is used to 
> copy files from one location to other. This tool provides various options to 
> configure degree of parallelism, bandwidth usage as well as integration with 
> different types and versions of file systems (e.g. AWS S3, Azure Blob store 
> etc.)
> - This separation of concern would also help Solr to focus on the key 
> functionality (i.e. querying and indexing) while delegating the copy 
> operation to the tools built for that purpose.
> - Users can decide if/when to copy the data files as against creating a 
> snapshot. e.g. a user may want to create a snapshot of a collection before 
> making an experimental change (e.g. updating/deleting docs, schema change 
> etc.). If the experiment is successful, he can delete the snapshot (without 
> having to copy the files). If the experiment is failed, then he can copy the 
> files associated with the snapshot and restore.
> Note that Apache Blur project is also providing a similar feature 
> [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to