jeel2420 opened a new pull request, #14483: URL: https://github.com/apache/kafka/pull/14483
This PR fixes the concurrency bug in RemoteIndexCache. (From Jira description) RemoteIndexCache has a concurrency bug which leads to IOException while fetching data from remote tier. Below events in order of timeline - - Thread 1 (cache thread): invalidates the entry, removalListener is invoked async, so the files have not been renamed to "deleted" suffix yet. - Thread 2: (fetch thread): tries to find entry in cache, doesn't find it because it has been removed by 1, fetches the entry from S3, writes it to existing file (using replace existing) - Thread 1: async removalListener is invoked, acquires a lock on old entry (which has been removed from cache), it renames the file to "deleted" and starts deleting it - Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM returns an error as it won't allow creation of 2GB random access file. **Fix**: Used `EvictionListener` instead of `RemovalListener` to perform the eviction synchronously in Caffeine cache and for the manual removal used computeIfAbsent to rename and delete the key from the cache synchronously by returning null. Added `testConcurrentRemoveReadForCache` to reproduce the bug by following the above timeline of events. Unit test case is passing now. Jira https://issues.apache.org/jira/browse/KAFKA-15481 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org