divijvaidya opened a new pull request, #13850:
URL: https://github.com/apache/kafka/pull/13850

   ## Problem
   RemoteIndexCache cache is accessed from multiple threads concurrently in the 
fetch from consumer code path [1]. 
   
   Currently, the RemoteIndexCache uses LinkedHashMap as the cache 
implementation internally. Since LinkedHashMap is not a thread safe data 
structure, we use coarse grained lock on the entire map/cache when writing to 
the cache.
   
   This means that if a thread if fetching information from a particular 
segment from RemoteStorageManager, other threads who are trying to access a 
different segment from the cache will also wait for the former thread to 
complete. This is due to the usage of global lock in the cache.
   
   This lock contentions leads to decrease in throughput for fetch from 
consumer for cases where RSM network call may take more time.
   
   ## Solution
   We need a data structure for the cache which satisfies the following 
requirements:
   1. Multiple threads should be able to read concurrently.
   2. Fetch for missing keys should not block read for available keys.
   3. Only one thread should fetch for a specific key.
   4. Should support LRU policy.
   
   In Java, all non concurrent data structures (such as LinkedHashMap) violate 
condition 2. We can potentially use Concurrent data structures such as 
ConcurrentHashMap but we will have to implement the LRU eviction ourselves on 
top of this. OR we can implement a LRU cache from scratch ourselves which 
satisfy the above constraints.
   
   Alternatively, (approach taken in this PR), we can use [Caffeine 
cache](https://github.com/ben-manes/caffeine) which satisfies all the 
requirements mentioned above.
   
   ## Changes
   - This PR uses Caffeine as the underlying cache for RemoteIndexCache. 
   - Old `File` API has been replaces with `Files` API introduced since JDK 7.
   
   ## Testing
   - A test has been added which verifies requirement 2 above. The test fails 
prior to the change and is successful after it.
   - New tests have been added to improve overall test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to