[ 
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5750:
--------------------------------
    Attachment: SOLR-5750.patch

- Added SolrJ support for Backup and Restore Collection Admin actions
- 2 API calls - Backup and Restore . Both support async and is recommended to 
use them for polling to see if the task completed. There are they BackupStatus 
and RestoreStatus commands like there were in previous patches.

*Backup*:
Required Params - name and collection. 
"location" can be optionally set via the cluster prop api. If the query 
parameter does not have it we refer to the value set in the cluster prop api

What it backs up in the location directory
 - Index data from the shard leaders
 - collection_state_backup.json ( the backed up collection state )
 -  backup.properties ( meta-data information )
 - configSet

*Restore*:
Required Params - name and collection. 
"location" can be optionally set via the cluster prop api. If the query 
parameter does not have it we refer to the value set in the cluster prop api

How it works
 - The restore collection name should not be present . Restore will create it 
for you. You can use collection alias to use it once it has been restored. We 
purposely don’t allow restoring into an existing collection since rolling back 
in a distributed setup would be tricky . Maybe in the future if we are 
confident we can allow this.
 - Creates a core-less collection with the config set from the backup ( it 
appends a restore.configSetName to it for avoiding collissions )
 - Marks the shards in "construction" state so that if someone is sending it 
documents they get buffered in the tlog . TODO don't do
 - Create one replica per shard and restore the data into this
 - Adds the necessary replicas to meet the same replication factor

bq. Another question is I wonder if any of these loops should be done in 
parallel or if they are issuing asynchronous requests so it isn't necessary. It 
would help to document the pertinent loops with this information, and possibly 
do some in parallel if they should be done so.

Yes that makes sense. We need to add this

bq. I looked at the patch. On the restore side I noticed a loop of slices and 
then a loop of replicas starting with this comment: "//Copy data from backed up 
index to each replica". Shouldn't there be just one replica per shard to 
restore, and then later the replicationFactor will expand to the desired level?

Yeah true. This patch has those changes.

It's still a work in progress. The restore needs hardening.

> Backup/Restore API for SolrCloud
> --------------------------------
>
>                 Key: SOLR-5750
>                 URL: https://issues.apache.org/jira/browse/SOLR-5750
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Varun Thacker
>             Fix For: 5.2, master
>
>         Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, 
> SOLR-5750.patch, SOLR-5750.patch
>
>
> We should have an easy way to do backups and restores in SolrCloud. The 
> ReplicationHandler supports a backup command which can create snapshots of 
> the index but that is too little.
> The command should be able to backup:
> # Snapshots of all indexes or indexes from the leader or the shards
> # Config set
> # Cluster state
> # Cluster properties
> # Aliases
> # Overseer work queue?
> A restore should be able to completely restore the cloud i.e. no manual steps 
> required other than bringing nodes back up or setting up a new cloud cluster.
> SOLR-5340 will be a part of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to