[jira] [Commented] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

Timothy Potter (JIRA) Thu, 28 Jun 2018 10:16:09 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526566#comment-16526566
 ]


Timothy Potter commented on SOLR-12523:
---------------------------------------

When I'm working on a cloud platform like EC2 or Google cloud, I don't want to 
deal with NFS when I have cloud storage like S3. I haven't had much luck in the 
past with using the Hdfs directory factory with S3 (I'll checkout SOLR-9952), 
so I figured I would just create the backup using Solr and then move the files 
out to cloud storage using tools optimized for S3. In the past, I think using 
an S3 destination for backup worked OK, but RESTORE took forever (all the check 
summing / sanity checking per file serially vs. concurrently) and given backup 
is usually part of a disaster recovery strategy, I don't want RESTORE taking 
hours to restore my index. If I pull that down from cloud storage to the local 
disks using some tool that's optimized for reading in bulk from S3 
(multi-threaded) and then restore from local, it's much faster. So for me, 
separating the concerns of creating the snapshot for each shard (Solr's job) 
and moving big files out to cloud storage (Solr needs to do much better in this 
regard or punt) is what I'm looking for.

> Confusing error reporting if backup attempted on non-shared FS
> --------------------------------------------------------------
>
>                 Key: SOLR-12523
>                 URL: https://issues.apache.org/jira/browse/SOLR-12523
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 7.3.1
>            Reporter: Timothy Potter
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-12523.patch
>
>
> So I have a large collection with 4 shards across 2 nodes. When I try to back 
> it up with:
> {code}
> curl 
> "http://localhost:8984/solr/admin/collections?action=BACKUP&name=sigs&collection=foo_signals&async=5&location=backups";
> {code}
> I either get:
> {code}
> "5170256188349065":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
>   "5170256187999044":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard3_replica_n10 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
> {code}
> or if I create the directory, then I get:
> {code}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":2},
>   "Operation backup caused 
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  The backup directory already exists: file:///vol1/cloud84/backups/sigs/",
>   "exception":{
>     "msg":"The backup directory already exists: 
> file:///vol1/cloud84/backups/sigs/",
>     "rspCode":400},
>   "status":{
>     "state":"failed",
>     "msg":"found [2] in failed tasks"}}
> {code}
> I'm thinking this has to do with having 2 cores from the same collection on 
> the same node but I can't get a collection with 1 shard on each node to work 
> either:
> {code}
> "ec2-52-90-245-38.compute-1.amazonaws.com:8984_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://ec2-52-90-245-38.compute-1.amazonaws.com:8984/solr: 
> Failed to backup core=system_jobs_history_shard2_replica_n6 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/ugh1"}
> {code}
> What's weird is that replica (system_jobs_history_shard2_replica_n6) is not 
> even on the ec2-52-90-245-38.compute-1.amazonaws.com node! It lives on a 
> different node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

Reply via email to