[ 
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872983#comment-15872983
 ] 

Ben Manes commented on SOLR-10141:
----------------------------------

Thanks!!! I think I found the bug. It now passes your test case.

The problem was due to put() stampeding over the value during the eviction. The 
[eviction 
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
 performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
   - resurrected if a race occurred (e.g. was thought expired, but newly 
accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener

This failed because 
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
 can perform its update outside of a hash table lock (e.g. a computation). It 
synchronizes on the entry to update, checking first if it was still alive. This 
resulted in a race where the entry was removed from the hash table, the value 
updated, and entry marked as dead. When the listener was notified, it received 
the wrong value.

The solution I have now is to expand the synchronized block on eviction. This 
passes your test and should be cheap. I'd like to review it a little more and 
incorporate your test into my suite.

This is an excellent find. I've stared at the code many times and the race 
seems obvious in hindsight.

> Caffeine cache causes BlockCache corruption 
> --------------------------------------------
>
>                 Key: SOLR-10141
>                 URL: https://issues.apache.org/jira/browse/SOLR-10141
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yonik Seeley
>         Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the 
> concurrency test passes with the previous implementation using 
> ConcurrentLinkedHashMap and fail with Caffeine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to