[ https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Thacker updated SOLR-5750: -------------------------------- Attachment: SOLR-5750.patch - Added SolrJ support for Backup and Restore Collection Admin actions - 2 API calls - Backup and Restore . Both support async and is recommended to use them for polling to see if the task completed. There are they BackupStatus and RestoreStatus commands like there were in previous patches. *Backup*: Required Params - name and collection. "location" can be optionally set via the cluster prop api. If the query parameter does not have it we refer to the value set in the cluster prop api What it backs up in the location directory - Index data from the shard leaders - collection_state_backup.json ( the backed up collection state ) - backup.properties ( meta-data information ) - configSet *Restore*: Required Params - name and collection. "location" can be optionally set via the cluster prop api. If the query parameter does not have it we refer to the value set in the cluster prop api How it works - The restore collection name should not be present . Restore will create it for you. You can use collection alias to use it once it has been restored. We purposely don’t allow restoring into an existing collection since rolling back in a distributed setup would be tricky . Maybe in the future if we are confident we can allow this. - Creates a core-less collection with the config set from the backup ( it appends a restore.configSetName to it for avoiding collissions ) - Marks the shards in "construction" state so that if someone is sending it documents they get buffered in the tlog . TODO don't do - Create one replica per shard and restore the data into this - Adds the necessary replicas to meet the same replication factor bq. Another question is I wonder if any of these loops should be done in parallel or if they are issuing asynchronous requests so it isn't necessary. It would help to document the pertinent loops with this information, and possibly do some in parallel if they should be done so. Yes that makes sense. We need to add this bq. I looked at the patch. On the restore side I noticed a loop of slices and then a loop of replicas starting with this comment: "//Copy data from backed up index to each replica". Shouldn't there be just one replica per shard to restore, and then later the replicationFactor will expand to the desired level? Yeah true. This patch has those changes. It's still a work in progress. The restore needs hardening. > Backup/Restore API for SolrCloud > -------------------------------- > > Key: SOLR-5750 > URL: https://issues.apache.org/jira/browse/SOLR-5750 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Varun Thacker > Fix For: 5.2, master > > Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch, > SOLR-5750.patch, SOLR-5750.patch > > > We should have an easy way to do backups and restores in SolrCloud. The > ReplicationHandler supports a backup command which can create snapshots of > the index but that is too little. > The command should be able to backup: > # Snapshots of all indexes or indexes from the leader or the shards > # Config set > # Cluster state > # Cluster properties > # Aliases > # Overseer work queue? > A restore should be able to completely restore the cloud i.e. no manual steps > required other than bringing nodes back up or setting up a new cloud cluster. > SOLR-5340 will be a part of this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org