[ https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872983#comment-15872983 ]
Ben Manes commented on SOLR-10141: ---------------------------------- Thanks!!! I think I found the bug. It now passes your test case. The problem was due to put() stampeding over the value during the eviction. The [eviction routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725] performed the following: # Read the key, value, etc # Conditionally removed in a computeIfPresent() block - resurrected if a race occurred (e.g. was thought expired, but newly accessed) # Mark the entry as "dead" (using a synchronized (entry) block) # Notify the listener This failed because [putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521] can perform its update outside of a hash table lock (e.g. a computation). It synchronizes on the entry to update, checking first if it was still alive. This resulted in a race where the entry was removed from the hash table, the value updated, and entry marked as dead. When the listener was notified, it received the wrong value. The solution I have now is to expand the synchronized block on eviction. This passes your test and should be cheap. I'd like to review it a little more and incorporate your test into my suite. This is an excellent find. I've stared at the code many times and the race seems obvious in hindsight. > Caffeine cache causes BlockCache corruption > -------------------------------------------- > > Key: SOLR-10141 > URL: https://issues.apache.org/jira/browse/SOLR-10141 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Yonik Seeley > Attachments: SOLR-10141.patch, Solr10141Test.java > > > After fixing the race conditions in the BlockCache itself (SOLR-10121), the > concurrency test passes with the previous implementation using > ConcurrentLinkedHashMap and fail with Caffeine. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org