genegr opened a new pull request, #13049:
URL: https://github.com/apache/cloudstack/pull/13049
### Description
This PR fixes two independent bugs in `FlashArrayAdapter.delete()`, the
adapter method the adaptive storage driver calls whenever CloudStack deletes a
volume or a snapshot on a Pure Storage FlashArray primary pool. On Purity 6.x
both paths currently fail, leaving objects leaked on the array and CloudStack
DB rows stuck in `Destroy` state with no clean recovery via the UI.
#### Bug 1 — Volume delete: `{name, destroyed}` PATCH rejected
The current code builds a single `FlashArrayVolume` object, then issues two
PATCH requests against it: first to rename the volume with a timestamp suffix
(so an operator browsing the array's 24 h recycle bin can see when each copy
was deleted), then to mark it `destroyed: true`. The second PATCH reuses the
same object, which still has the new name set, so the request body carries
**both** `name` and `destroyed`. Purity rejects the combined body:
```
400 Bad Request
Invalid combination of parameters specified.
```
The first PATCH succeeds (volume renamed), the second fails. Net result: the
volume leaks on the array (never reaches the destroyed bucket) and the
CloudStack `volumes` row stays in `Destroy` forever.
**Expected behaviour**: `cmk delete volume id=<id>` returns success; the
array shows the volume with `destroyed=true`, `time_remaining≈24h`; the
CloudStack row is `Expunged`.
**Actual behaviour (before this PR)**: the async job fails with `Invalid
combination of parameters specified`; the array shows the volume renamed with a
`-<timestamp>` suffix but `destroyed=false`; the CloudStack row stays in
`Destroy` and `removed` is never set.
**Fix**: keep the rename (it's an operator-visible forensic trail that is
genuinely useful when recovering from accidental deletes), but issue the two
operations as two separate PATCH requests, each body carrying only its own
field. No behaviour change from the user's point of view on a successful delete.
**Repro**: on a cluster with a FlashArray adaptive primary pool, `cmk delete
volume id=<id>` on any volume that has been provisioned on the array.
#### Bug 2 — Snapshot delete: wrong endpoint and invalid rename
The same `delete()` method is called by the snapshot service
(`SnapshotServiceImpl.deleteSnapshot` →
`AdaptiveDataStoreDriverImpl.deleteAsync` → `api.delete(context, inData)`), but
the adapter has no special handling when `dataObject.getType()` is `SNAPSHOT`.
Two problems compound:
- FlashArray snapshots live at `/volume-snapshots`, not `/volumes`. The
PATCH hits the wrong endpoint.
- The rename computes `<snapshot_name>-<timestamp>`. FlashArray snapshot
names have the reserved form `<volume>.<suffix>`, so the new name becomes e.g.
`cloudstack::vol-2-1-2-192.1-20260420214120` — which contains a literal `.`.
The array rejects the rename:
> Volume name must be between 1 and 63 characters (alphanumeric, `_` and
`-`), begin and end with a letter or number, and include at least one letter,
`_`, or `-`.
**Expected behaviour**: `cmk delete snapshot id=<id>` returns success; the
array shows the snapshot `destroyed=true` with a 24 h recycle window; the
CloudStack row is `Destroyed`.
**Actual behaviour (before this PR)**: the async job fails with a 400 from
the array; the snapshot leaks on both sides.
**Fix**: branch early in `delete()` when the data object is a `SNAPSHOT`.
Issue a single PATCH to `/volume-snapshots?names=<name>` with body `{
destroyed: true }` on the original name. No rename — Pure assigns the
`.<suffix>` portion itself and rejects any freely-appended text. Recovery via
`purevol recover` still works within the 24 h window.
<!-- Fixes: # -->
### Types of changes
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] Build/CI
- [ ] Test (unit or integration test code)
### Feature/Enhancement Scale or Bug Severity
#### Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
#### Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [x] Major
- [ ] Minor
- [ ] Trivial
Deletes of every array-backed volume and snapshot on the FlashArray plugin
fail, silently leaking space on the array and leaving stuck CloudStack DB rows.
The pool otherwise works (create/attach/detach/snapshot/revert all succeed), so
the bug is not a BLOCKER, but it fully breaks the lifecycle end state of any
object that goes through the plugin.
### Screenshots (if appropriate):
N/A — backend adapter changes, no UI surface.
### How Has This Been Tested?
Tested manually against a two-node KVM cluster on CloudStack 4.22 with a
Pure Storage FlashArray (Purity 6.7) primary pool registered via the
adaptive/flasharray plugin. The plugin was patched in-place with the classes
built from this branch and the management server restarted before each test.
**Commit 1 — volume delete**
1. `cmk create volume name=<n> diskofferingid=<pool-tagged>`
2. `cmk attach volume id=<v> virtualmachineid=<vm>` — forces provisioning on
the array
3. `cmk detach volume id=<v>`
4. `cmk delete volume id=<v>`
Verified: delete returns `success: true`; the array shows
`cloudstack::vol-2-1-2-N-<timestamp>` with `destroyed=true`,
`time_remaining=86400000`; CloudStack `volumes.state=Expunged` with `removed`
set.
**Commit 2 — snapshot delete**
1. `cmk create snapshot volumeid=<v> name=<s>`
2. `cmk delete snapshot id=<s>`
Verified: delete returns `success: true`; the array shows
`cloudstack::vol-2-1-2-N.1` with `destroyed=true`, `time_remaining=86400000`;
CloudStack `snapshots.status=Destroyed`.
**Regression coverage on adjacent paths**: create, attach, snapshot, revert,
detach, re-attach, re-delete all continue to work end-to-end. Full
attach/detach cycle across both commits: disk formatted as XFS, file written,
volume detached, re-attached, filesystem UUID and contents preserved.
**Build**: `mvn -pl plugins/storage/volume/flasharray --also-make -am
-DskipTests -Dcheckstyle.skip=false --batch-mode package` passes with
checkstyle enabled.
No unit tests exist for `FlashArrayAdapter`; the class is built entirely
around the Purity REST API and would require substantial HTTP-level mocking to
cover meaningfully. Happy to add integration-style tests in a follow-up if the
reviewers want them.
#### How did you try to break this feature and the system with this change?
- **Delete a volume that is still attached to a VM**: CloudStack refuses the
request before `delete()` is reached, as expected.
- **Delete a volume twice**: first call succeeds; second call sees `Volume
does not exist` from the array, caught by the existing exception handler, no
error surfaced. Verified both commits preserve this behaviour.
- **Delete a snapshot whose parent volume has already been destroyed**:
snapshot delete still succeeds via `/volume-snapshots` — the snapshot is an
independent object in Purity until the retention window expires.
- **Delete a snapshot twice**: first call succeeds; second call returns `No
such volume or snapshot`, caught by the new exception handler in the snapshot
branch, no error surfaced.
- **Delete during maintenance / while the CS pool is in maintenance mode**:
CloudStack gates the API call at a higher layer; the adapter is never reached.
Not affected by this change.
- **Race with revert**: snapshot revert holds a reference to the snapshot on
the array; the revert operation completes first, and the subsequent delete then
operates on a non-live snapshot. Verified with `cmk revert snapshot`
immediately followed by `cmk delete snapshot`.
- **Build with checkstyle enabled**: passes. No new warnings introduced.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]