divijvaidya commented on PR #13850:
URL: https://github.com/apache/kafka/pull/13850#issuecomment-1594369804

   Thank you for looking into this @satishd 
   
   > But we observed an impact when there is a degradation in our remote 
storage(HDFS) clusters. 
   
   Yes, you are right, the impact of this lock contention is felt in cases when 
fetching from RSM is slow. We observed something similar.
   
   > Another way is to change the current LinkedHashMap to a synchronized map 
and use a slot based lock instead of a global lock. This workaround is less 
performant than Caffiene but it does not introduce a new dependency.
   
   I understand what you are saying. I did consider that approach. So that we 
are on the same page, the approach looks like:
   ```
   // Thread safe data structure that holds the locks per entry
   // Keys from this data structure are removed when a 
   val entryLockMap = ConcurrentHashMap[Entry, Lock]()
   
   // Internal LRU cache based on LinkedHashMap. LinkedHashMap is made thread 
safe by synchronising all its operations by using SynchronizedCache (from 
org.apache.kafka.common.cache)
   val internalCache = SynchronizedCache(new LinkedHashMap())
   
   getEntry(key)) {
       // return key is available in cache
       if (internalCache.contains(key)) {
           return internalCache.get(key)
       }    
       
       // we have to update the entry in the cache. we will acquire a lock on 
entry, fetch it and then update in cache.
       createEntryLock in entryLockMap if not present
       
       // acquire exclusive lock on entry
       entryLockMap.get(key).lock()
       
       try {
           // after acquiring lock, check again for presence of key in cache, 
another thread may have updated it while 
           // this thread was waiting for lock
           if (internalCache.contains(key)) {
               return internalCache.get(key)
           } 
           val entry = fetchFromRSM(key)
           // no need to acquire a lock in internal cache since get/put methods 
on this cache are synchronized
           internalCache.put(entry)
       } finally {
           release entry lock
       }
   }
   ```
   The advantages of this approach are:
   1. No external dependency
   2. Same pattern of cache used in other places in the Kafka code (see 
org.apache.kafka.common.cache.SynchronizedCache). But that code was written in 
2015 when perhaps better alternatives didn't exist at that time.
   
   The disadvantages of this approach are:
   1. Cache metrics will have to be completed manually
   2. Cache eviction is done synchronously and hence, it holds the global cache 
lock. 
   3. In the absence of a lock that ensures fairness, when we synchronise the 
global cache, a fetch thread may get starved leading to high tail latencies. 
Such a scenario will occur when we have a high get() operation load and other 
threads for get() are getting prioritized by the lock ahead of the put() 
thread. That is why I am not a big fan of using data structures with coarse 
grained global locks.
   
   Caffeine on the other hand doesn't have any of the disadvantage mentioned 
above, is used in high throughput low latency systems like Cassandra and has a 
constant release cadence with great support. Hence, I believe that using 
Caffeine is the right choice here. (I know that you already agreed to this but 
adding more context here for other readers of the PR)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to