divijvaidya opened a new pull request, #13850: URL: https://github.com/apache/kafka/pull/13850
## Problem RemoteIndexCache cache is accessed from multiple threads concurrently in the fetch from consumer code path [1]. Currently, the RemoteIndexCache uses LinkedHashMap as the cache implementation internally. Since LinkedHashMap is not a thread safe data structure, we use coarse grained lock on the entire map/cache when writing to the cache. This means that if a thread if fetching information from a particular segment from RemoteStorageManager, other threads who are trying to access a different segment from the cache will also wait for the former thread to complete. This is due to the usage of global lock in the cache. This lock contentions leads to decrease in throughput for fetch from consumer for cases where RSM network call may take more time. ## Solution We need a data structure for the cache which satisfies the following requirements: 1. Multiple threads should be able to read concurrently. 2. Fetch for missing keys should not block read for available keys. 3. Only one thread should fetch for a specific key. 4. Should support LRU policy. In Java, all non concurrent data structures (such as LinkedHashMap) violate condition 2. We can potentially use Concurrent data structures such as ConcurrentHashMap but we will have to implement the LRU eviction ourselves on top of this. OR we can implement a LRU cache from scratch ourselves which satisfy the above constraints. Alternatively, (approach taken in this PR), we can use [Caffeine cache](https://github.com/ben-manes/caffeine) which satisfies all the requirements mentioned above. ## Changes - This PR uses Caffeine as the underlying cache for RemoteIndexCache. - Old `File` API has been replaces with `Files` API introduced since JDK 7. ## Testing - A test has been added which verifies requirement 2 above. The test fails prior to the change and is successful after it. - New tests have been added to improve overall test coverage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
