[jira] [Commented] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

Hrishikesh Gadre (JIRA) Thu, 28 Jun 2018 18:51:21 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527029#comment-16527029
 ]


Hrishikesh Gadre commented on SOLR-12523:
-----------------------------------------

{quote}So for me, separating the concerns of creating the snapshot for each 
shard (Solr's job) and moving big files out to cloud storage (Solr needs to do 
much better in this regard or punt) is what I'm looking for.
{quote}
[~thelabdude] this is the exact use case for which we added snapshots mechanism 
(Ref: SOLR-9038). As part of Cloudera Search, we use this functionality to 
provide backup and disaster recovery functionality for Solr,

[https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/]

 

When user creates a snapshot, Solr associates user specified snapshot name with 
the latest commit point for each core associated with the given collection. 
Once the snapshot is created, Solr ensures that the files associated with the 
commit point associated with the snapshot name are not deleted (e.g. as part of 
optimize operation). It also records the snapshot metadata in Zookeeper and 
provides access to it via Collections API. Now you are free to use any 
mechanism to copy these index files to remote location (e.g. in our case we use 
DistCp - a tool specifically designed large scale data copy which also works 
well with cloud object stores). I agree with your point about slow restore 
operation. May be we can extend the snapshot API to restore in-place ? e.g. 
create index.xxx directory automatically and copy the files. Once this is done, 
we can just switch the index directory on-the-fly (just the way we do at the 
time of full replication as part of core recovery). 

 

 

 

> Confusing error reporting if backup attempted on non-shared FS
> --------------------------------------------------------------
>
>                 Key: SOLR-12523
>                 URL: https://issues.apache.org/jira/browse/SOLR-12523
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 7.3.1
>            Reporter: Timothy Potter
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-12523.patch
>
>
> So I have a large collection with 4 shards across 2 nodes. When I try to back 
> it up with:
> {code}
> curl 
> "http://localhost:8984/solr/admin/collections?action=BACKUP&name=sigs&collection=foo_signals&async=5&location=backups";
> {code}
> I either get:
> {code}
> "5170256188349065":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
>   "5170256187999044":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard3_replica_n10 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
> {code}
> or if I create the directory, then I get:
> {code}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":2},
>   "Operation backup caused 
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  The backup directory already exists: file:///vol1/cloud84/backups/sigs/",
>   "exception":{
>     "msg":"The backup directory already exists: 
> file:///vol1/cloud84/backups/sigs/",
>     "rspCode":400},
>   "status":{
>     "state":"failed",
>     "msg":"found [2] in failed tasks"}}
> {code}
> I'm thinking this has to do with having 2 cores from the same collection on 
> the same node but I can't get a collection with 1 shard on each node to work 
> either:
> {code}
> "ec2-52-90-245-38.compute-1.amazonaws.com:8984_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://ec2-52-90-245-38.compute-1.amazonaws.com:8984/solr: 
> Failed to backup core=system_jobs_history_shard2_replica_n6 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/ugh1"}
> {code}
> What's weird is that replica (system_jobs_history_shard2_replica_n6) is not 
> even on the ec2-52-90-245-38.compute-1.amazonaws.com node! It lives on a 
> different node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

Reply via email to