[ 
https://issues.apache.org/jira/browse/SOLR-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429472#comment-17429472
 ] 

Jason Gerlowski commented on SOLR-15696:
----------------------------------------

In reproducing this, I noticed some errors in the logs that I think point to 
the real root cause here:

{code}
Caused by: java.lang.IllegalArgumentException: Unable to parse invalid 
ShardBackupId: md_shard2_0_0
  at org.apache.solr.core.backup.ShardBackupId.from(ShardBackupId.java:59)
  at 
org.apache.solr.handler.admin.BackupCoreOp.parseShardBackupId(BackupCoreOp.java:99)
  at org.apache.solr.handler.admin.BackupCoreOp.execute(BackupCoreOp.java:44)
  at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
  ... 43 more
{code}

It looks like the "ShardBackupId" parsing code is too brittle to handle the 
shard name that results from a splitshard. For context, each splitshard 
produces two shards whose names are a combination of the original shard name 
and a numeric suffix, connected by an underscore.  ShardBackupId parsing relies 
on underscores to separate different sections of the ID, so the additional 
underscores added by a splitshard cause issues with the naive parsing code.

This should be relatively easy to test and fix.

> Incremental Backups fail following a splitshard op
> --------------------------------------------------
>
>                 Key: SOLR-15696
>                 URL: https://issues.apache.org/jira/browse/SOLR-15696
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 8.9
>            Reporter: Jason Gerlowski
>            Assignee: Jason Gerlowski
>            Priority: Major
>
> Filing this ticket on behalf of a reporter on the mailing list (Jordan Diehl) 
> who had trouble filing it themselves. See the "Can't create solr jira bugs" 
> thread for more context.
> "I have been attempting to use the incremental backup API on Solr 8.9.0, but 
> while testing in our product we would occasionally get into a state where all 
> subsequent backup attempts would fail. After some triage we found that it was 
> happening to any collection which had undergone a shard split operation. If 
> we did a backup, completed a shard split operation, then attempted another 
> backup, the second backup would fail with a FileNotFound exception relating 
> to the backup id of the second backup as the error message."
> *Steps to Reproduce:*
> Run the script found [here|https://paste.apache.org/o4uta] in a clean 8.9.0 
> download. In essence, it (1) creates and fills a collection (2) performs a 
> backup (successfully), (3) splits a shard in the collection, and (4) triggers 
> another backup (which fails).
> *Expected Behavior*
> "If this operation is being blocked intentionally, then I would expect an 
> informative error message explaining why it failed. Otherwise I would expect 
> the backup to complete successfully."
> *Actual Behavior*
> "The backup operation fails with a NoSuchFileException."
> {code:java}
> {
>   "responseHeader":{
>     "status":500,
>     "QTime":54},
>   "failure":{
> "MYIPADDRESS:31018_solr":"org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:Error
>  from server at null: Error handling 'BACKUPCORE' action"},
>   "Operation backup caused 
> exception:":"java.nio.file.NoSuchFileException:java.nio.file.NoSuchFileException:
>  /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>   "exception":{
>     
> "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>     "rspCode":-1},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     
> "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>     "trace":"org.apache.solr.common.SolrException: 
> /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:301)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)...<snip>...",
>     "code":500}}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to