[ 
https://issues.apache.org/jira/browse/HDDS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saketa Chalamchala updated HDDS-14768:
--------------------------------------
    Description: 
1. SnapshotCache cleanup might be unnecessarily strict. 
The cleanup throws an `IllegalStateException` when it finds stale entries in 
`pendingEvictionQueue` for snapshots that have already been removed from dbMap 
(say SnapshotPurge invalidates the entry right before the last thread with a 
reference to the snapshot just closes adding the snapshotID back to the 
evictionQueue).
It might be better for cleanup to log the stale entry proceed without throwing 
since this is not a correctness bug.

2. `invalidate` should also remove the entry from `pendingEvictionQueue` to be 
more consistent with bookkeeping.

3. Snapshot close failure during cleanup removes the snapshotID from eviction 
queue and throws an exception. This causes the snapshot to remain in cache even 
is refCount = 0 unless some other thread explicitly invalidates it which maybe 
unlikely. This means `SnapshotCache.lock()` during this time cannot cannot hold 
the write lock because `lock()` expects the cache to be drained. 
Cleanup should be best-effort for all snapshots pending eviction and any 
snapshots close failures should be retried the next time cleanup runs so that 
`lock()` can have a chance at acquiring the lock after the retry. 

4. Fix write lock leak in SnapshotCache. If the cache drain `cleanup(true)` 
throws an exception writeLock is not released the lock.


  was:
1. SnapshotCache cleanup might be unnecessarily strict. 
The cleanup throws an `IllegalStateException` when it finds stale entries in 
`pendingEvictionQueue` for snapshots that have already been removed from dbMap 
(say SnapshotPurge invalidates the entry right before the last thread with a 
reference to the snapshot just closes adding the snapshotID back to the 
evictionQueue).
It might be better for cleanup to log the stale entry proceed without throwing 
since this is not a correctness bug.

2. `invalidate` should also remove the entry from `pendingEvictionQueue` to be 
more consistent with bookkeeping.

3. Snapshot close failure during cleanup removes the entry from eviction queue 
and throws an exception. 


> Fix lock leak on SnapshotCache cleanup and handle eviction race appropriately
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-14768
>                 URL: https://issues.apache.org/jira/browse/HDDS-14768
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>            Reporter: Saketa Chalamchala
>            Assignee: Saketa Chalamchala
>            Priority: Major
>
> 1. SnapshotCache cleanup might be unnecessarily strict. 
> The cleanup throws an `IllegalStateException` when it finds stale entries in 
> `pendingEvictionQueue` for snapshots that have already been removed from 
> dbMap (say SnapshotPurge invalidates the entry right before the last thread 
> with a reference to the snapshot just closes adding the snapshotID back to 
> the evictionQueue).
> It might be better for cleanup to log the stale entry proceed without 
> throwing since this is not a correctness bug.
> 2. `invalidate` should also remove the entry from `pendingEvictionQueue` to 
> be more consistent with bookkeeping.
> 3. Snapshot close failure during cleanup removes the snapshotID from eviction 
> queue and throws an exception. This causes the snapshot to remain in cache 
> even is refCount = 0 unless some other thread explicitly invalidates it which 
> maybe unlikely. This means `SnapshotCache.lock()` during this time cannot 
> cannot hold the write lock because `lock()` expects the cache to be drained. 
> Cleanup should be best-effort for all snapshots pending eviction and any 
> snapshots close failures should be retried the next time cleanup runs so that 
> `lock()` can have a chance at acquiring the lock after the retry. 
> 4. Fix write lock leak in SnapshotCache. If the cache drain `cleanup(true)` 
> throws an exception writeLock is not released the lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to