Re: [PR] HDDS-15066. Read-Write Lock race leave stale references to container creating orphan replicas [ozone]

via GitHub Mon, 11 May 2026 07:40:17 -0700


ChenSammi commented on PR #10109:
URL: https://github.com/apache/ozone/pull/10109#issuecomment-4421721539


   > > Generally we should choose the one with less complexity, and less cost. 
If replacing the fields inline, there are at least 4 fields' swaps, one by one. 
If replacing the container reference, there is one swap. Basically, hot swap 
one pointer has less complex result than hot swap four pointers, right?
   > 
   > I don't think updating fields in an object, which can be done atomically 
under an existing lock is complex.
   > 
   > Swapping the object for another is simple, but look at this bug it causes 
and the fix which is bleeding out into lots of code areas. And even after this 
fix, if someone adds a new service, they have to know they need to do these 
strange things (get a container, then get it again and check it hasn't changed) 
to avoid a race condition they shouldn't really be concerned about.
   > 
   > If we fix this at source, by locking the container object, updating the 
fields atomically and unlocking, then its fixed once and for all in a single 
place
   
   @sodonnel ,  currently we swap one container pointer under container's read 
lock protection, we got one stale container reference.  If we swap four fields 
under the same container's read lock protection, we got four stale fields 
reference.  We need to double check all code places where these four stale 
fields reference can happen to know how many new race conditions and what's 
their consequences, they might be less severe or more severe than this jira.
   For example, the chunk read doesn't hold any container lock, once the thread 
has the container object, it will assume volume, chunk, db path will be not 
changed. If multiple fields are swapped, it could end up with some path point 
to new location, some path point to old location, which seems more dangerous 
than the issues described by this jira, which are mainly there could be some 
blocks left in the replica, or some replicas are left not deleted in the first 
place(I believe this replica will be eventually deleted, a new replica for a 
deleted container will be deleted eventually).  
   And even if we implement the swap four fields solution, it's also not once 
and for all, every new code added, we also need to verify if there is no stale 
reference case. 
   
   The root cause of issue described in this JIRA, is because we do the 
container pointer swap in containerSet 
   under a read lock protection after container is moved to destination volume, 
 if it's under a write lock protection, then some blocks left in the replica, 
or some replicas are left not deleted in the first place will not happen.  But 
consider moving a container from one volume might take a substantial time, we 
cannot afford to write lock the container, as it will make other threads 
waiting for operation on this container wait for a considerable amount of time. 
   
   The current approach in this jira, is not a perfect approach, the perfect 
approach is use write lock, but it's a approach with relative low complexity, 
as double check after get lock is common in concurrent environment, easy to 
understand, and it's easy to find out where the write lock is held, and do the 
double check after the lock acquired, which is easier than identify all the 
places where volume, db, chunk path, meta path are used, and handle the stale 
reference. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-15066. Read-Write Lock race leave stale references to container creating orphan replicas [ozone]

Reply via email to