oyjhl opened a new issue, #824:
URL: https://github.com/apache/solr-operator/issues/824

   # What happened
   
   A `SolrBackup` can remain stuck in `InProgress` forever if:
   
   1. the backup request is accepted ([operator backup 
submit](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/util/backup_util.go#L94-L109),
 [stable backup async 
ID](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/util/backup_util.go#L56-L58)),
   2. the operator is scaled down,
   3. the Solr async tracking entry for that backup is deleted ([DELETESTATUS 
API](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L855-L930),
 [tracker 
deleteSingleAsyncId](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java#L284-L287)),
   4. and the operator is scaled back up.
   
   After that:
   
   - the backup files already exist in Solr,
   - `REQUESTSTATUS` for the backup request ID returns `notfound` 
([REQUESTSTATUS 
API](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L793-L853),
 [tracker 
getAsyncTaskRequestStatus](https://github.com/apache/solr/blob/e0fe9619839f7ef2b43496104fa539a5b091db9a/solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java#L217-L270)),
   - but the `SolrBackup` CR still stays in `inProgress=true`,
   - and the operator keeps polling instead of reaching a terminal state 
([backup reconcile 
loop](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/solrbackup_controller.go#L284-L306),
 [requeue after 
5s](https://github.com/apache/solr-operator/blob/ca9d3c5c37a59f29570a6b49a8da5dc614aba75e/controllers/solrbackup_controller.go#L141-L143)).
   
   
   # Environment
   
   - macOS
   - local `kind` cluster
   - Kubernetes / kind node version: `v1.32.1`
   - `solr-operator` version: `v0.10.0-orerekease`
   - `solr-operator` built from `master` on March 22, 2026 
(`ca9d3c5c37a59f29570a6b49a8da5dc614aba75e`)
   - Solr version: `9.10.0`
   
   # Steps to reproduce
   
   1. Deploy `solr-operator` on a local `kind` cluster.
   2. Create a 1-node `SolrCloud` with a local backup repository, then create a
      collection and start a `SolrBackup` for it.
   3. As soon as the backup first shows `inProgress=true`, scale the
      `solr-operator` deployment down to `0`.
   4. While the operator is down, wait for the Solr async request to finish, 
then
      delete only that async status entry with `DELETESTATUS`.
   5. Confirm the backup data still exists, but `REQUESTSTATUS` for that same
      request ID now returns `notfound`.
   6. Scale the operator back up to `1` and observe that the `SolrBackup` CR 
never
      reaches a terminal state.
   
   The stuck status looks like:
   
   ```yaml
   status:
     collectionBackupStatuses:
     - asyncBackupStatus: notfound
       inProgress: true
   ```
   
   and it never sets `finished: true` or `successful: true`.
   
   # Expected behavior
   
   Once an accepted backup later becomes `notfound`, the operator should not 
leave the CR in `InProgress` forever.
   
   It should eventually either:
   
   - recover, or
   - mark the backup failed with a clear reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to