[ https://issues.apache.org/jira/browse/KAFKA-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767818#comment-17767818 ]
Luke Chen commented on KAFKA-15481: ----------------------------------- Thanks for the suggestion! [~ben.manes]! > Concurrency bug in RemoteIndexCache leads to IOException > -------------------------------------------------------- > > Key: KAFKA-15481 > URL: https://issues.apache.org/jira/browse/KAFKA-15481 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.6.0 > Reporter: Divij Vaidya > Priority: Major > Fix For: 3.7.0 > > > RemoteIndexCache has a concurrency bug which leads to IOException while > fetching data from remote tier. > Below events in order of timeline - > Thread 1 (cache thread): invalidates the entry, removalListener is invoked > async, so the files have not been renamed to "deleted" suffix yet. > Thread 2: (fetch thread): tries to find entry in cache, doesn't find it > because it has been removed by 1, fetches the entry from S3, writes it to > existing file (using replace existing) > Thread 1: async removalListener is invoked, acquires a lock on old entry > (which has been removed from cache), it renames the file to "deleted" and > starts deleting it > Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file > and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM > returns an error as it won't allow creation of 2GB random access file. > *Potential Fix* > Use EvictionListener instead of RemovalListener in Caffeine cache as per the > documentation: > {quote} When the operation must be performed synchronously with eviction, use > {{Caffeine.evictionListener(RemovalListener)}} instead. This listener will > only be notified when {{RemovalCause.wasEvicted()}} is true. For an explicit > removal, {{Cache.asMap()}} offers compute methods that are performed > atomically.{quote} > This will ensure that removal from cache and marking the file with delete > suffix is synchronously done, hence the above race condition will not occur. -- This message was sent by Atlassian Jira (v8.20.10#820010)