oyjhl opened a new issue, #824: URL: https://github.com/apache/solr-operator/issues/824
# What happened A `SolrBackup` can remain stuck in `InProgress` forever if: 1. the backup request is accepted ([operator backup submit](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/util/backup_util.go#L94-L109), [stable backup async ID](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/util/backup_util.go#L56-L58)), 2. the operator is scaled down, 3. the Solr async tracking entry for that backup is deleted ([DELETESTATUS API](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L855-L930), [tracker deleteSingleAsyncId](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java#L284-L287)), 4. and the operator is scaled back up. After that: - the backup files already exist in Solr, - `REQUESTSTATUS` for the backup request ID returns `notfound` ([REQUESTSTATUS API](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L793-L853), [tracker getAsyncTaskRequestStatus](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java#L217-L270)), - but the `SolrBackup` CR still stays in `inProgress=true`, - and the operator keeps polling instead of reaching a terminal state ([backup reconcile loop](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/solrbackup_controller.go#L284-L306), [requeue after 5s](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/solrbackup_controller.go#L141-L143)). # Environment - macOS - local `kind` cluster - Kubernetes / kind node version: `v1.32.1` - `solr-operator` version: `v0.10.0-orerekease` - `solr-operator` built from `master` on March 22, 2026 (`ca9d3c5c37a59f29570a6b49a8da5dc614aba75e`) - Solr version: `9.10.0` # Steps to reproduce 1. Deploy `solr-operator` on a local `kind` cluster. 2. Create a 1-node `SolrCloud` with a local backup repository, then create a collection and start a `SolrBackup` for it. 3. As soon as the backup first shows `inProgress=true`, scale the `solr-operator` deployment down to `0`. 4. While the operator is down, wait for the Solr async request to finish, then delete only that async status entry with `DELETESTATUS`. 5. Confirm the backup data still exists, but `REQUESTSTATUS` for that same request ID now returns `notfound`. 6. Scale the operator back up to `1` and observe that the `SolrBackup` CR never reaches a terminal state. The stuck status looks like: ```yaml status: collectionBackupStatuses: - asyncBackupStatus: notfound inProgress: true ``` and it never sets `finished: true` or `successful: true`. # Expected behavior Once an accepted backup later becomes `notfound`, the operator should not leave the CR in `InProgress` forever. It should eventually either: - recover, or - mark the backup failed with a clear reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
