[ 
https://issues.apache.org/jira/browse/SOLR-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466714#comment-17466714
 ] 

Artem Abeleshev edited comment on SOLR-15842 at 12/30/21, 9:03 AM:
-------------------------------------------------------------------

Hi, Richardo! Thanks for rising an issue.

In short:

Unfortunately it is unable to workaround the problem due to its nature. This is 
caused because the results of the shard backup requests are not included to the 
task object that is used for async tracking.

Details:

Async backup requests work the following way. When a request hits the Solr it 
checks if there is an _async_ parameter provided. If it is, the action will be 
queued and response will be immediately returned without waiting until backup 
will be completed. Then queued action will be then processed and it's result 
will be placed to the distributed map within the Zookeeper (you can check these 
maps using Solr Admin web interface). In case of collection backup, a request 
will be also sent to each of the shards to make backup of the index files (each 
shard will backup it's own index files). These requests will be also sent as 
async and will be handled in a similar way. Action will be submitted to 
executor service and immediate response will be returned without waiting until 
the shard backup process will be completed. After submitted action will be 
processed the results would be stored in local tracking map in a form of 
{_}org.apache.solr.handler.admin.CoreAdminHandler.TaskObject{_}:

{_}org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse){_}:
{code:java}
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception {
      ...
      final String taskId = req.getParams().get(CommonAdminParams.ASYNC);
      final TaskObject taskObject = new TaskObject(taskId);
      ...
      if(taskId != null) {
      ...
        addTask(RUNNING, taskObject);
      }
      ...
      CoreAdminOperation op = opMap.get(req.getParams().get(ACTION, 
STATUS.toString()).toLowerCase(Locale.ROOT));
      ...
      final CallInfo callInfo = new CallInfo(this, req, rsp, op);
      ...
      if (taskId == null) {
        callInfo.call();
      } else {
        ...
        parallelExecutor.execute(() -> {
          boolean exceptionCaught = false;
          try {
            callInfo.call();
            taskObject.setRspObject(callInfo.rsp);
          } catch (Exception e) {
            exceptionCaught = true;
            taskObject.setRspObjectFromException(e);
          } finally {
            removeTask("running", taskObject.taskId);
            if (exceptionCaught) {
              addTask("failed", taskObject, true);
            } else {
              addTask("completed", taskObject, true);
            }
          }
        });
        ...
      }
      ...
    }
{code}
Then, results located at the tracking map will be polled by sender using backup 
status requests. The problem you've raised in this issue lies within the 
_setRspObject_ method call and the nature of how async response is stored. 
After executing the _call_ method of the _CallInfo_ it will contain a response 
with all the backup results. Then, these results are supposed to be placed to 
the _TaskObject_ by calling method {_}setRspObject{_}. But let's check what 
happening there:
{code:java}
  public void setRspObject(SolrQueryResponse rspObject) {
    this.rspInfo = rspObject.getToLogAsString("TaskId: " + this.taskId);
  }
{code}
Results located in _SolrQueryResponse_ are completely ignored here, Instead, a 
string with shard request async id and all the content of the _toLog_ list of 
the _SolrQueryResponse_ is placed to the tracking result. This is what each 
shard would return to the sender as a result of index backup:
{code:json}
{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "STATUS": "completed",
    "Response": "TaskId: 10402421194574306142 webapp=null path=/admin/cores 
params={core=techproducts_shard1_replica_n2&async=10402421194574306142&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_0&action=BACKUPCORE&location=file:///path/to/my/shared/drive/mybackup/techproducts&incremental=true&wt=javabin&version=2}
 status=0 QTime=57"
}
{code}
Note the _Response_ value:
{code:json}
"Response": "TaskId: 10402421194574306142 webapp=null path=/admin/cores 
params={core=techproducts_shard1_replica_n2&async=10402421194574306142&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_0&action=BACKUPCORE&location=file:///path/to/my/shared/drive/mybackup/techproducts&incremental=true&wt=javabin&version=2}
 status=0 QTime=57"
{code}
and compare it to the _response_ value of a sync request result:
{code:json}
"response": [
    "startTime",
    "2021-12-24T14:20:32.021Z",
    "indexFileCount",
    21,
    "uploadedIndexFileCount",
    21,
    "indexSizeMB",
    0.006,
    "uploadedIndexFileMB",
    0.006,
    "shard",
    "shard1",
    "endTime",
    "2021-12-24T14:20:32.396Z",
    "shardBackupId",
    "md_shard1_0"
]
{code}
After sender will obtain all the results from all the shards it will try to 
aggregate them:

{_}org.apache.solr.cloud.api.collections.BackupCmd.aggregateResults(NamedList, 
String, BackupManager, BackupProperties, Collection<Slice>){_}:
{code:java}
  private NamedList aggregateResults(NamedList results, String collectionName,
                                    BackupManager backupManager,
                                    BackupProperties backupProps,
                                    Collection<Slice> slices) {
    NamedList<Object> aggRsp = new NamedList<>();
    aggRsp.add("collection", collectionName);
    aggRsp.add("numShards", slices.size());
    aggRsp.add("backupId", backupManager.getBackupId().id);
    aggRsp.add("indexVersion", backupProps.getIndexVersion());
    aggRsp.add("startTime", backupProps.getStartTime());

    double indexSizeMB = 0;
    NamedList shards = (NamedList) results.get("success");
    for (int i = 0; i < shards.size(); i++) {
      NamedList shardResp = 
(NamedList)((NamedList)shards.getVal(i)).get("response");
      if (shardResp == null)
        continue;
      indexSizeMB += (double) shardResp.get("indexSizeMB");
    }
    aggRsp.add("indexSizeMB", indexSizeMB);
    return aggRsp;
  }
{code}
As you can see, in the case of the _success_ block found it will iterate over 
all entries and will try to extract the _response_ value. For async requests it 
will fail here for at least two reasons:
 - shard async response contains value _Response_ not _response_
 - shard async response value _Response_ is a type of _String_ not a _NamedList_

To get more clear picture of the whole process read an article I wrote about 
collection backup: [Code Anatomy: Solr Collection 
Backup|https://tyoma.hashnode.dev/code-anatomy-solr-collection-backup]


was (Author: JIRAUSER282679):
In short:

Unfortunately it is unable to workaround the problem due to its nature. This is 
caused because the results of the shard backup requests are not included to the 
task object that is used for async tracking.

Details:

Async backup requests work the following way. When a request hits the Solr it 
checks if there is an _async_ parameter provided. If it is, the action will be 
queued and response will be immediately returned without waiting until backup 
will be completed. Then queued action will be then processed and it's result 
will be placed to the distributed map within the Zookeeper (you can check these 
maps using Solr Admin web interface). In case of collection backup, a request 
will be also sent to each of the shards to make backup of the index files (each 
shard will backup it's own index files). These requests will be also sent as 
async and will be handled in a similar way. Action will be submitted to 
executor service and immediate response will be returned without waiting until 
the shard backup process will be completed. After submitted action will be 
processed the results would be stored in local tracking map in a form of 
_org.apache.solr.handler.admin.CoreAdminHandler.TaskObject_:

_org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse)_:

{code:java}
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception {
      ...
      final String taskId = req.getParams().get(CommonAdminParams.ASYNC);
      final TaskObject taskObject = new TaskObject(taskId);
      ...
      if(taskId != null) {
      ...
        addTask(RUNNING, taskObject);
      }
      ...
      CoreAdminOperation op = opMap.get(req.getParams().get(ACTION, 
STATUS.toString()).toLowerCase(Locale.ROOT));
      ...
      final CallInfo callInfo = new CallInfo(this, req, rsp, op);
      ...
      if (taskId == null) {
        callInfo.call();
      } else {
        ...
        parallelExecutor.execute(() -> {
          boolean exceptionCaught = false;
          try {
            callInfo.call();
            taskObject.setRspObject(callInfo.rsp);
          } catch (Exception e) {
            exceptionCaught = true;
            taskObject.setRspObjectFromException(e);
          } finally {
            removeTask("running", taskObject.taskId);
            if (exceptionCaught) {
              addTask("failed", taskObject, true);
            } else {
              addTask("completed", taskObject, true);
            }
          }
        });
        ...
      }
      ...
    }
{code}

Then, results located at the tracking map will be polled by sender using backup 
status requests. The problem you've raised in this issue lies within the 
_setRspObject_ method call and the nature of how async response is stored. 
After executing the _call_ method of the _CallInfo_ it will contain a response 
with all the backup results. Then, these results are supposed to be placed to 
the _TaskObject_ by calling method _setRspObject_. But let's check what 
happening there: 

{code:java}
  public void setRspObject(SolrQueryResponse rspObject) {
    this.rspInfo = rspObject.getToLogAsString("TaskId: " + this.taskId);
  }
{code}

Results located in _SolrQueryResponse_ are completely ignored here, Instead, a 
string with shard request async id and all the content of the _toLog_ list of 
the _SolrQueryResponse_ is placed to the tracking result. This is what each 
shard would return to the sender as a result of index backup:

{code:json}
{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "STATUS": "completed",
    "Response": "TaskId: 10402421194574306142 webapp=null path=/admin/cores 
params={core=techproducts_shard1_replica_n2&async=10402421194574306142&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_0&action=BACKUPCORE&location=file:///path/to/my/shared/drive/mybackup/techproducts&incremental=true&wt=javabin&version=2}
 status=0 QTime=57"
}
{code}

Note the _Response_ value:

{code:json}
"Response": "TaskId: 10402421194574306142 webapp=null path=/admin/cores 
params={core=techproducts_shard1_replica_n2&async=10402421194574306142&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_0&action=BACKUPCORE&location=file:///path/to/my/shared/drive/mybackup/techproducts&incremental=true&wt=javabin&version=2}
 status=0 QTime=57"
{code}

and compare it to the _response_ value of a sync request result:

{code:json}
"response": [
    "startTime",
    "2021-12-24T14:20:32.021Z",
    "indexFileCount",
    21,
    "uploadedIndexFileCount",
    21,
    "indexSizeMB",
    0.006,
    "uploadedIndexFileMB",
    0.006,
    "shard",
    "shard1",
    "endTime",
    "2021-12-24T14:20:32.396Z",
    "shardBackupId",
    "md_shard1_0"
]
{code}

After sender will obtain all the results from all the shards it will try to 
aggregate them:

_org.apache.solr.cloud.api.collections.BackupCmd.aggregateResults(NamedList, 
String, BackupManager, BackupProperties, Collection<Slice>)_:

{code:java}
  private NamedList aggregateResults(NamedList results, String collectionName,
                                    BackupManager backupManager,
                                    BackupProperties backupProps,
                                    Collection<Slice> slices) {
    NamedList<Object> aggRsp = new NamedList<>();
    aggRsp.add("collection", collectionName);
    aggRsp.add("numShards", slices.size());
    aggRsp.add("backupId", backupManager.getBackupId().id);
    aggRsp.add("indexVersion", backupProps.getIndexVersion());
    aggRsp.add("startTime", backupProps.getStartTime());

    double indexSizeMB = 0;
    NamedList shards = (NamedList) results.get("success");
    for (int i = 0; i < shards.size(); i++) {
      NamedList shardResp = 
(NamedList)((NamedList)shards.getVal(i)).get("response");
      if (shardResp == null)
        continue;
      indexSizeMB += (double) shardResp.get("indexSizeMB");
    }
    aggRsp.add("indexSizeMB", indexSizeMB);
    return aggRsp;
  }
{code}

As you can see, in the case of the _success_ block found it will iterate over 
all entries and will try to extract the _response_ value. For async requests it 
will fail here for at least two reasons:

- shard async response contains value _Response_ not _response_
- shard async response value _Response_ is a type of _String_ not a _NamedList_

To get more clear picture of the whole process read an article I wrote about 
collection backup: [Code Anatomy: Solr Collection 
Backup|https://tyoma.hashnode.dev/code-anatomy-solr-collection-backup]

> Collection Backup Status doesn't calculate de IndexSizeMb correctly.
> --------------------------------------------------------------------
>
>                 Key: SOLR-15842
>                 URL: https://issues.apache.org/jira/browse/SOLR-15842
>             Project: Solr
>          Issue Type: Bug
>          Components: Backup/Restore, SolrCloud
>    Affects Versions: 8.10, 8.10.1
>            Reporter: Ricardo Ruiz Maldonado
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When [backing up|#backup]] a collection either for the *S3 Repository* or the 
> {*}LocalFileSystemRepository{*}, if I provide the *async* parameter and then 
> check the status of the backup with the [REQUESTSTATUS|#requeststatus]] 
> endpoint, even if the backup finishes successfully, the *indexSizeMB* 
> parameter is always 0.
> If I do a *sync* backup and wait until it finishes, then the *indexSizeMB* 
> parameter has the right value.
> Here are some examples of the responses for each case:
> h3. S3 Backup (Sync)
> {code:java}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":30640},
>   "success":{
>     "10.9.21.42:8983_solr":{
>       "responseHeader":{
>         "status":0,
>         "QTime":5857},
>       "response":[
>         "startTime","2021-12-09T03:16:14.944860Z",
>         "indexFileCount",18,
>         "uploadedIndexFileCount",18,
>         "indexSizeMB",0.026,
>         "uploadedIndexFileMB",0.026,
>         "shard","shard2",
>         "endTime","2021-12-09T03:16:20.694631Z",
>         "shardBackupId","md_shard2_0"]},
>     "10.9.21.42:8983_solr":{
>       "responseHeader":{
>         "status":0,
>         "QTime":5891},
>       "response":[
>         "startTime","2021-12-09T03:16:14.951702Z",
>         "indexFileCount",18,
>         "uploadedIndexFileCount",18,
>         "indexSizeMB",0.133,
>         "uploadedIndexFileMB",0.133,
>         "shard","shard1",
>         "endTime","2021-12-09T03:16:20.735084Z",
>         "shardBackupId","md_shard1_0"]}},
>   "response":[
>     "collection","collection",
>     "numShards",2,
>     "backupId",0,
>     "indexVersion","8.10.1",
>     "startTime","2021-12-09T03:16:14.381680Z",
>     "indexSizeMB",0.159]}{code}
> h3. S3 Backup (async)
> {code:java}
> {
>     "responseHeader": {
>         "status": 0,
>         "QTime": 4
>     },
>     "success": {
>         "10.9.21.42:8983_solr": {
>             "responseHeader": {
>                 "status": 0,
>                 "QTime": 2
>             }
>         },
>         "10.9.21.42:8983_solr": {
>             "responseHeader": {
>                 "status": 0,
>                 "QTime": 3
>             }
>         },
>         "10.9.21.42:8983_solr": {
>             "responseHeader": {
>                 "status": 0,
>                 "QTime": 0
>             },
>             "STATUS": "completed",
>             "Response": "TaskId: backup120415121643240950269884 webapp=null 
> path=/admin/cores 
> params={core=collectionName_shard2_replica_n4&async=backup120415121643240950269884&qt=/admin/cores&name=shard2&shardBackupId=md_shard2_1&action=BACKUPCORE&location=s3:/b39587e3-c296-4634-b8e2-7ff1e94e6a69/index/backup.1/collectionName/&incremental=true&repository=s3&prevShardBackupId=md_shard2_0&wt=javabin&version=2}
>  status=0 QTime=2"
>         },
>         "10.9.21.42:8983_solr": {
>             "responseHeader": {
>                 "status": 0,
>                 "QTime": 0
>             },
>             "STATUS": "completed",
>             "Response": "TaskId: backup120415121643240950730312 webapp=null 
> path=/admin/cores 
> params={core=collectionName_shard1_replica_n1&async=backup120415121643240950730312&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_1&action=BACKUPCORE&location=s3:/b39587e3-c296-4634-b8e2-7ff1e94e6a69/index/backup.1/collectionName/&incremental=true&repository=s3&prevShardBackupId=md_shard1_0&wt=javabin&version=2}
>  status=0 QTime=3"
>         }
>     },
>     "backup120415121643240950269884": {
>         "responseHeader": {
>             "status": 0,
>             "QTime": 0
>         },
>         "STATUS": "completed",
>         "Response": "TaskId: backup120415121643240950269884 webapp=null 
> path=/admin/cores 
> params={core=collectionName_shard2_replica_n4&async=backup120415121643240950269884&qt=/admin/cores&name=shard2&shardBackupId=md_shard2_1&action=BACKUPCORE&location=s3:/b39587e3-c296-4634-b8e2-7ff1e94e6a69/index/backup.1/collectionName/&incremental=true&repository=s3&prevShardBackupId=md_shard2_0&wt=javabin&version=2}
>  status=0 QTime=2"
>     },
>     "backup120415121643240950730312": {
>         "responseHeader": {
>             "status": 0,
>             "QTime": 0
>         },
>         "STATUS": "completed",
>         "Response": "TaskId: backup120415121643240950730312 webapp=null 
> path=/admin/cores 
> params={core=collectionName_shard1_replica_n1&async=backup120415121643240950730312&qt=/admin/cores&name=shard1&shardBackupId=md_shard1_1&action=BACKUPCORE&location=s3:/b39587e3-c296-4634-b8e2-7ff1e94e6a69/index/backup.1/collectionName/&incremental=true&repository=s3&prevShardBackupId=md_shard1_0&wt=javabin&version=2}
>  status=0 QTime=3"
>     },
>     "response": [
>         "collection",
>         "collectionName",
>         "numShards",
>         2,
>         "backupId",
>         1,
>         "indexVersion",
>         "8.10.1",
>         "startTime",
>         "2021-12-04T06:12:52.540773Z",
>         "indexSizeMB",
>         0.0
>     ],
>     "status": {
>         "state": "completed",
>         "msg": "found [backup12041512] in completed tasks"
>     }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to