Gargi-jais11 commented on PR #10109:
URL: https://github.com/apache/ozone/pull/10109#issuecomment-4340923770
**Second Finding: For QUASI_CLOSED container if SCM sends force Close
command and DiskBalancer is working on same container**
**QUASI_CLOSED** containers can be force-closed by SCM
(`CloseContainerCommand with force=true`). That goes through
`controller.closeContainer()`.
Here is the exact timeline:
```
DiskBalancer (DN1)
CloseContainerCommandHandler
─────────────────────────────────────────
─────────────────────────────────────────
T1: container = containerSet.getContainer(C)
→ OLD container (QUASI_CLOSED, Disk1)
T2: container.readLock() on OLD
T3: copy Disk1 → Disk2 ... T3a:
container = containerSet.getContainer(C)
→ OLD container (before updateContainer)
T3b: switch(container.getContainerState())
→ QUASI_CLOSED + force=true
→ controller.closeContainer(id)
→ containerSet.getContainer(id)
→ OLD container (still, before T5)
→ container.close()
→ writeLock() → BLOCKED ←------- readLock held
T4: copy done, atomic move to Disk2
T5: importContainer →
newContainer (QUASI_CLOSED, Disk2)
T6: containerSet.updateContainer(newContainer)
← ContainerSet now maps C → newContainer
T7: container.readUnlock() ← releases
OLD readLock
T7a: writeLock ACQUIRED on OLD container
→ OLD: QUASI_CLOSED → CLOSED
→ sendICR(OLD=CLOSED) → SCM told C is CLOSED
T8: container.markContainerForDelete(OLD)
→ writeLock → OLD: CLOSED → DELETED
```
**after T8**
```
State
In ContainerSet?
---------------------------------------------------------------------------------------
OLD container (Disk1) DELETED
No (updateContainer removed it)
NEW container (Disk2) QUASI_CLOSED
Yes — this is the live replica
```
`SCM's view`: Container C on **DN1 = CLOSED** (from ICR sent at T7a),
`Reality`: Container C on DN1 = **QUASI_CLOSED (newContainer)**.
**This is a kind of regression:**
SCM thinks it's **CLOSED**. But DN1's next container report says
**QUASI_CLOSED**. SCM sees a state "regression" **(CLOSED → QUASI_CLOSED)**.
Depending on the FCR sent to SCM, it may:
Re-send a `force close command → controller.closeContainer(id)` now
re-fetches from ContainerSet → gets NEW container → closes it correctly →
CLOSED. Eventually converges.
Or treat it as an unhealthy/inconsistent replica.
No data loss — the data is intact on Disk2. But there is a state
inconsistency window where SCM's cached state (CLOSED) differs from reality
(QUASI_CLOSED on the new disk).
I think here as well we need to re-fetch the container .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]