[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877051#comment-15877051 ] ASF subversion and git services commented on SOLR-10121: Commit 8dbb1bb3fb64fea4baa672ce82a1b62af22c3571 in lucene-solr's branch refs/heads/branch_6x from [~yo...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8dbb1bb ] SOLR-10121: enable BlockCacheTest.testBlockCacheConcurrent that now passes > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: SOLR-10121.patch > > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877050#comment-15877050 ] ASF subversion and git services commented on SOLR-10121: Commit cf1cba66f49c551cddbc6053565c30bf3a8b23bb in lucene-solr's branch refs/heads/master from [~yo...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf1cba6 ] SOLR-10121: enable BlockCacheTest.testBlockCacheConcurrent that now passes > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: SOLR-10121.patch > > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868088#comment-15868088 ] Yonik Seeley commented on SOLR-10121: - I'm splitting off the Caffeine issues to SOLR-10141 since the BlockCache race conditions that have existed since inception and will need to be handled/backported separately. > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: SOLR-10121.patch > > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868008#comment-15868008 ] ASF subversion and git services commented on SOLR-10121: Commit 65e2d2add68a557b1e628039c328f9346df282f9 in lucene-solr's branch refs/heads/branch_6x from [~yo...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65e2d2a ] SOLR-10121: fix race conditions in BlockCache.fetch and BlockCache.store > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: SOLR-10121.patch > > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867962#comment-15867962 ] ASF subversion and git services commented on SOLR-10121: Commit b71a667d74dfabeaad9584372bded80b0c609add in lucene-solr's branch refs/heads/master from [~yo...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b71a667 ] SOLR-10121: fix race conditions in BlockCache.fetch and BlockCache.store > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > Attachments: SOLR-10121.patch > > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864590#comment-15864590 ] Yonik Seeley commented on SOLR-10121: - Thanks for the extra info - running the eviction listener in a separate thread shouldn't matter for correctness, but may work better the way this BlockCache code is written anyway. I went back and re-tested right before the Caffeine switch (SOLR-7355) and was able to reproduce some fails by bumping up the concurrency. > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864352#comment-15864352 ] Ben Manes commented on SOLR-10121: -- Can you try a local hack of changing Caffeine versions and, if it fails, try reverting back to CLHM? Both should be easy changes that could help us isolate it. Also note that CLHM ran the eviction listener on the same thread, whereas Caffeine delegates that to the executor. If there is a race due to that, you could use `executor(Runnable::run)` in the builder. > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864343#comment-15864343 ] Yonik Seeley commented on SOLR-10121: - Hmmm, so on further review of BlockCache.java, I think I've found 2 concurrency issues. Unfortunately, fixing those issues does not get my test to pass. Another "issue" is that my test did pass pre-Caffeine, which means the test is not good enough at sussing out issues (since the BlockCache bugs I identified should not depend on the underlying map implementation). > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863924#comment-15863924 ] Ben Manes commented on SOLR-10121: -- Yes, a write should constitute a publication. Caffeine decorates a ConcurrentHashMap but does bypass it at times. By default eviction is asynchronous by delegating to fjp commonPool, but can be configured to use the caller instead. That might be useful for testing. Solr uses an old version of Caffeine. A patch was reviewed and approved, but needs someone to merge it in SOLR-8241. I'm not aware of a visibility bug in any release, but staying current would be helpful as I have fixed bugs since that version. > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10121) BlockCache corruption with high concurrency
[ https://issues.apache.org/jira/browse/SOLR-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863699#comment-15863699 ] Yonik Seeley commented on SOLR-10121: - I reviewed the pertinent BlockCache and couldn't see any thread safety issues. Looking at the history of BlockCache, I reverted to right before SOLR-7355 was applied, and the issues went away. So it looks like a thread safety or usage issue with Caffeine? [~ben.manes], does putting a key/value in Caffeine constitute safe publication to a different thread (as is the case with ConcurrentHashMap for example)? > BlockCache corruption with high concurrency > --- > > Key: SOLR-10121 > URL: https://issues.apache.org/jira/browse/SOLR-10121 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Assignee: Yonik Seeley > > Improving the tests of the BlockCache in SOLR-10116 uncovered a corruption > bug (either that or the test is flawed... TBD). > The failing test is TestBlockCache.testBlockCacheConcurrent() -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org