genegr opened a new pull request, #13049:
URL: https://github.com/apache/cloudstack/pull/13049

   ### Description
   
   This PR fixes two independent bugs in `FlashArrayAdapter.delete()`, the 
adapter method the adaptive storage driver calls whenever CloudStack deletes a 
volume or a snapshot on a Pure Storage FlashArray primary pool. On Purity 6.x 
both paths currently fail, leaving objects leaked on the array and CloudStack 
DB rows stuck in `Destroy` state with no clean recovery via the UI.
   
   #### Bug 1 — Volume delete: `{name, destroyed}` PATCH rejected
   
   The current code builds a single `FlashArrayVolume` object, then issues two 
PATCH requests against it: first to rename the volume with a timestamp suffix 
(so an operator browsing the array's 24 h recycle bin can see when each copy 
was deleted), then to mark it `destroyed: true`. The second PATCH reuses the 
same object, which still has the new name set, so the request body carries 
**both** `name` and `destroyed`. Purity rejects the combined body:
   
   ```
   400 Bad Request
   Invalid combination of parameters specified.
   ```
   
   The first PATCH succeeds (volume renamed), the second fails. Net result: the 
volume leaks on the array (never reaches the destroyed bucket) and the 
CloudStack `volumes` row stays in `Destroy` forever.
   
   **Expected behaviour**: `cmk delete volume id=<id>` returns success; the 
array shows the volume with `destroyed=true`, `time_remaining≈24h`; the 
CloudStack row is `Expunged`.
   **Actual behaviour (before this PR)**: the async job fails with `Invalid 
combination of parameters specified`; the array shows the volume renamed with a 
`-<timestamp>` suffix but `destroyed=false`; the CloudStack row stays in 
`Destroy` and `removed` is never set.
   
   **Fix**: keep the rename (it's an operator-visible forensic trail that is 
genuinely useful when recovering from accidental deletes), but issue the two 
operations as two separate PATCH requests, each body carrying only its own 
field. No behaviour change from the user's point of view on a successful delete.
   
   **Repro**: on a cluster with a FlashArray adaptive primary pool, `cmk delete 
volume id=<id>` on any volume that has been provisioned on the array.
   
   #### Bug 2 — Snapshot delete: wrong endpoint and invalid rename
   
   The same `delete()` method is called by the snapshot service 
(`SnapshotServiceImpl.deleteSnapshot` → 
`AdaptiveDataStoreDriverImpl.deleteAsync` → `api.delete(context, inData)`), but 
the adapter has no special handling when `dataObject.getType()` is `SNAPSHOT`. 
Two problems compound:
   
   - FlashArray snapshots live at `/volume-snapshots`, not `/volumes`. The 
PATCH hits the wrong endpoint.
   - The rename computes `<snapshot_name>-<timestamp>`. FlashArray snapshot 
names have the reserved form `<volume>.<suffix>`, so the new name becomes e.g. 
`cloudstack::vol-2-1-2-192.1-20260420214120` — which contains a literal `.`. 
The array rejects the rename:
   
     > Volume name must be between 1 and 63 characters (alphanumeric, `_` and 
`-`), begin and end with a letter or number, and include at least one letter, 
`_`, or `-`.
   
   **Expected behaviour**: `cmk delete snapshot id=<id>` returns success; the 
array shows the snapshot `destroyed=true` with a 24 h recycle window; the 
CloudStack row is `Destroyed`.
   **Actual behaviour (before this PR)**: the async job fails with a 400 from 
the array; the snapshot leaks on both sides.
   
   **Fix**: branch early in `delete()` when the data object is a `SNAPSHOT`. 
Issue a single PATCH to `/volume-snapshots?names=<name>` with body `{ 
destroyed: true }` on the original name. No rename — Pure assigns the 
`.<suffix>` portion itself and rejects any freely-appended text. Recovery via 
`purevol recover` still works within the 24 h window.
   
   <!-- Fixes: # -->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [x] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] Build/CI
   - [ ] Test (unit or integration test code)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [x] Major
   - [ ] Minor
   - [ ] Trivial
   
   Deletes of every array-backed volume and snapshot on the FlashArray plugin 
fail, silently leaking space on the array and leaving stuck CloudStack DB rows. 
The pool otherwise works (create/attach/detach/snapshot/revert all succeed), so 
the bug is not a BLOCKER, but it fully breaks the lifecycle end state of any 
object that goes through the plugin.
   
   ### Screenshots (if appropriate):
   
   N/A — backend adapter changes, no UI surface.
   
   ### How Has This Been Tested?
   
   Tested manually against a two-node KVM cluster on CloudStack 4.22 with a 
Pure Storage FlashArray (Purity 6.7) primary pool registered via the 
adaptive/flasharray plugin. The plugin was patched in-place with the classes 
built from this branch and the management server restarted before each test.
   
   **Commit 1 — volume delete**
   1. `cmk create volume name=<n> diskofferingid=<pool-tagged>`
   2. `cmk attach volume id=<v> virtualmachineid=<vm>` — forces provisioning on 
the array
   3. `cmk detach volume id=<v>`
   4. `cmk delete volume id=<v>`
   
   Verified: delete returns `success: true`; the array shows 
`cloudstack::vol-2-1-2-N-<timestamp>` with `destroyed=true`, 
`time_remaining=86400000`; CloudStack `volumes.state=Expunged` with `removed` 
set.
   
   **Commit 2 — snapshot delete**
   1. `cmk create snapshot volumeid=<v> name=<s>`
   2. `cmk delete snapshot id=<s>`
   
   Verified: delete returns `success: true`; the array shows 
`cloudstack::vol-2-1-2-N.1` with `destroyed=true`, `time_remaining=86400000`; 
CloudStack `snapshots.status=Destroyed`.
   
   **Regression coverage on adjacent paths**: create, attach, snapshot, revert, 
detach, re-attach, re-delete all continue to work end-to-end. Full 
attach/detach cycle across both commits: disk formatted as XFS, file written, 
volume detached, re-attached, filesystem UUID and contents preserved.
   
   **Build**: `mvn -pl plugins/storage/volume/flasharray --also-make -am 
-DskipTests -Dcheckstyle.skip=false --batch-mode package` passes with 
checkstyle enabled.
   
   No unit tests exist for `FlashArrayAdapter`; the class is built entirely 
around the Purity REST API and would require substantial HTTP-level mocking to 
cover meaningfully. Happy to add integration-style tests in a follow-up if the 
reviewers want them.
   
   #### How did you try to break this feature and the system with this change?
   
   - **Delete a volume that is still attached to a VM**: CloudStack refuses the 
request before `delete()` is reached, as expected.
   - **Delete a volume twice**: first call succeeds; second call sees `Volume 
does not exist` from the array, caught by the existing exception handler, no 
error surfaced. Verified both commits preserve this behaviour.
   - **Delete a snapshot whose parent volume has already been destroyed**: 
snapshot delete still succeeds via `/volume-snapshots` — the snapshot is an 
independent object in Purity until the retention window expires.
   - **Delete a snapshot twice**: first call succeeds; second call returns `No 
such volume or snapshot`, caught by the new exception handler in the snapshot 
branch, no error surfaced.
   - **Delete during maintenance / while the CS pool is in maintenance mode**: 
CloudStack gates the API call at a higher layer; the adapter is never reached. 
Not affected by this change.
   - **Race with revert**: snapshot revert holds a reference to the snapshot on 
the array; the revert operation completes first, and the subsequent delete then 
operates on a non-live snapshot. Verified with `cmk revert snapshot` 
immediately followed by `cmk delete snapshot`.
   - **Build with checkstyle enabled**: passes. No new warnings introduced.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to