fuyou001 commented on PR #10519:
URL: https://github.com/apache/rocketmq/pull/10519#issuecomment-4741428792

   **[P1] Make cache ownership transfer atomic with concurrent cleanup**
   
   `PopConsumerCache.writeAndDeleteRecords` first classifies an old CK with 
`consumerRecords.contains(record)`, then performs the RocksDB write/delete 
batch, and only afterward calls `deleteRecords(bufferDeleteRecords)`. This 
check and removal are not atomic with `cleanupRecords`.
   
   A cleanup thread can move the old CK from `recordTreeMap` to 
`removeTreeMap`, flush it to RocksDB, and clear it from the cache after 
`contains` returned true but before the final cache deletion. The renewal path 
then writes the new CK without including the old CK in `storeDeleteRecords`; 
its final cache deletion fails, and that failure result is ignored. RocksDB can 
therefore retain both the old and new CK, potentially reviving the same message 
more than once. If the keys coincide, a late cleanup write can also overwrite 
the new value with the old one.
   
   Please coordinate cleanup and renewal through an atomic claim/removal or 
another explicit ownership state transition before deciding whether the old CK 
must be deleted from RocksDB. A concurrent regression test should pause between 
classification and deletion while cleanup stages and flushes the same record.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to