[jira] [Commented] (SOLR-8687) Race condition with RTGs during soft commit
[ https://issues.apache.org/jira/browse/SOLR-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153422#comment-15153422 ] Ishan Chattopadhyaya commented on SOLR-8687: bq. it's also disconcerting if it's not air-tight and our tests don't catch it With [~sar...@syr.edu]'s help, I stress tested 1000 rounds (16 at a time) of the StressReorderTest and it didn't fail. However, the above mentioned test, which is similar to StressReorderTest, but on a 3 node cluster instead of a simulated replica, failed around 10 times (each with this exact failure). Also, I had increased the number of read operations within each test from 50k to 200k. At this time, I am reasonably sure that the test had nothing to do with my other changes. Next up, I shall isolate the test from the other changes and try to run it on a fresh master so as to be sure I can reproduce. > Race condition with RTGs during soft commit > --- > > Key: SOLR-8687 > URL: https://issues.apache.org/jira/browse/SOLR-8687 > Project: Solr > Issue Type: Bug >Reporter: Ishan Chattopadhyaya > > I am facing a problem with stress testing SOLR-5944, even though I think this > problem persists in Solr even without my changes. > The symptom is that during a stress test (similar to TestStressReorder), RTG > gets a document which is older version than that of the last acknowledged > write. > Possible reason: > {code} > (DUH2's commit()) > ... > 1: if (cmd.softCommit) { > 2:// ulog.preSoftCommit(); > 3:synchronized (solrCoreState.getUpdateLock()) { > 4: if (ulog != null) ulog.preSoftCommit(cmd); > 5: core.getSearcher(true, false, waitSearcher, true); > 6: if (ulog != null) ulog.postSoftCommit(cmd); > 7:} > 8:callPostSoftCommitCallbacks(); > 9: } > ... > {code} > * Before line 1, there was an update (say id=2) which was in ulog's map. Maps > are, say, map=\{2=LogPtr(1234)\} , prevMap=\{...\} , prevMap2=\{...\} > * Due to line 4 (ulog.preSoftCommit()), the maps were rotated. Now, the id=2 > is in prevMap: map={}, prevMap=\{2=LogPtr(1234)\}, prevMap2=\{...\} . Till > now RTG for id=2 will work. > * Due to line 5, a new searcher is due to be opened. But this is > asynchronous, and lets assume this doesn't complete before few more lines are > executed. > * Due to line 6 (ulog.postSoftCommit()), the previous maps are cleared out. > Now the maps are: map={}, prevMap=null, prevMap2=null > * If there's an RTG for id=2, it will not work from the ulog's maps, so it > will fall through to be searched using the last searcher. But, the searcher > due to be opened in line 5 hasn't yet been opened. In this case, the returned > document will be whatever version of id=2 that was present in the previous > searcher. > Can someone please confirm if this is a potential problem? If so, any > suggestions for a fix, please? I tried opening a ulog.openRealtimeSearcher() > in the above synchronized block, but the problem still persists, but I > haven't looked into why that could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8687) Race condition with RTGs during soft commit
[ https://issues.apache.org/jira/browse/SOLR-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153413#comment-15153413 ] Yonik Seeley commented on SOLR-8687: Hmmm, RTG should be air-tight (I always hate changes to synchronization in this area because it's hard to re-validate). We have stress tests for this, so it's also disconcerting if it's not air-tight and our tests don't catch it. I'll review the related code again... > Race condition with RTGs during soft commit > --- > > Key: SOLR-8687 > URL: https://issues.apache.org/jira/browse/SOLR-8687 > Project: Solr > Issue Type: Bug >Reporter: Ishan Chattopadhyaya > > I am facing a problem with stress testing SOLR-5944, even though I think this > problem persists in Solr even without my changes. > The symptom is that during a stress test (similar to TestStressReorder), RTG > gets a document which is older version than that of the last acknowledged > write. > Possible reason: > {code} > (DUH2's commit()) > ... > 1: if (cmd.softCommit) { > 2:// ulog.preSoftCommit(); > 3:synchronized (solrCoreState.getUpdateLock()) { > 4: if (ulog != null) ulog.preSoftCommit(cmd); > 5: core.getSearcher(true, false, waitSearcher, true); > 6: if (ulog != null) ulog.postSoftCommit(cmd); > 7:} > 8:callPostSoftCommitCallbacks(); > 9: } > ... > {code} > * Before line 1, there was an update (say id=2) which was in ulog's map. Maps > are, say, map=\{2=LogPtr(1234)\} , prevMap=\{...\} , prevMap2=\{...\} > * Due to line 4 (ulog.preSoftCommit()), the maps were rotated. Now, the id=2 > is in prevMap: map={}, prevMap=\{2=LogPtr(1234)\}, prevMap2=\{...\} . Till > now RTG for id=2 will work. > * Due to line 5, a new searcher is due to be opened. But this is > asynchronous, and lets assume this doesn't complete before few more lines are > executed. > * Due to line 6 (ulog.postSoftCommit()), the previous maps are cleared out. > Now the maps are: map={}, prevMap=null, prevMap2=null > * If there's an RTG for id=2, it will not work from the ulog's maps, so it > will fall through to be searched using the last searcher. But, the searcher > due to be opened in line 5 hasn't yet been opened. In this case, the returned > document will be whatever version of id=2 that was present in the previous > searcher. > Can someone please confirm if this is a potential problem? If so, any > suggestions for a fix, please? I tried opening a ulog.openRealtimeSearcher() > in the above synchronized block, but the problem still persists, but I > haven't looked into why that could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8687) Race condition with RTGs during soft commit
[ https://issues.apache.org/jira/browse/SOLR-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153329#comment-15153329 ] Shalin Shekhar Mangar commented on SOLR-8687: - This does look like a genuine bug, Ishan. We wait for the new searcher outside of the synchronized block and so as you said, a concurrent RTG request can miss updates. Perhaps [~ysee...@gmail.com] has a suggestion for us? > Race condition with RTGs during soft commit > --- > > Key: SOLR-8687 > URL: https://issues.apache.org/jira/browse/SOLR-8687 > Project: Solr > Issue Type: Bug >Reporter: Ishan Chattopadhyaya > > I am facing a problem with stress testing SOLR-5944, even though I think this > problem persists in Solr even without my changes. > The symptom is that during a stress test (similar to TestStressReorder), RTG > gets a document which is older version than that of the last acknowledged > write. > Possible reason: > {code} > (DUH2's commit()) > ... > 1: if (cmd.softCommit) { > 2:// ulog.preSoftCommit(); > 3:synchronized (solrCoreState.getUpdateLock()) { > 4: if (ulog != null) ulog.preSoftCommit(cmd); > 5: core.getSearcher(true, false, waitSearcher, true); > 6: if (ulog != null) ulog.postSoftCommit(cmd); > 7:} > 8:callPostSoftCommitCallbacks(); > 9: } > ... > {code} > * Before line 1, there was an update (say id=2) which was in ulog's map. Maps > are, say, map=\{2=LogPtr(1234)\} , prevMap=\{...\} , prevMap2=\{...\} > * Due to line 4 (ulog.preSoftCommit()), the maps were rotated. Now, the id=2 > is in prevMap: map={}, prevMap=\{2=LogPtr(1234)\}, prevMap2=\{...\} . Till > now RTG for id=2 will work. > * Due to line 5, a new searcher is due to be opened. But this is > asynchronous, and lets assume this doesn't complete before few more lines are > executed. > * Due to line 6 (ulog.postSoftCommit()), the previous maps are cleared out. > Now the maps are: map={}, prevMap=null, prevMap2=null > * If there's an RTG for id=2, it will not work from the ulog's maps, so it > will fall through to be searched using the last searcher. But, the searcher > due to be opened in line 5 hasn't yet been opened. In this case, the returned > document will be whatever version of id=2 that was present in the previous > searcher. > Can someone please confirm if this is a potential problem? If so, any > suggestions for a fix, please? I tried opening a ulog.openRealtimeSearcher() > in the above synchronized block, but the problem still persists, but I > haven't looked into why that could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8687) Race condition with RTGs during soft commit
[ https://issues.apache.org/jira/browse/SOLR-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151059#comment-15151059 ] Ishan Chattopadhyaya commented on SOLR-8687: My stress test is here: https://github.com/chatman/lucene-solr/blob/627b9ac9b46796f20be78b04ebbdfa4299b96ab7/solr/core/src/test/org/apache/solr/cloud/TestStressInPlaceUpdates.java > Race condition with RTGs during soft commit > --- > > Key: SOLR-8687 > URL: https://issues.apache.org/jira/browse/SOLR-8687 > Project: Solr > Issue Type: Bug >Reporter: Ishan Chattopadhyaya > > I am facing a problem with stress testing SOLR-5944, even though I think this > problem persists in Solr even without my changes. > The symptom is that during a stress test (similar to TestStressReorder), RTG > gets a document which is older version than that of the last acknowledged > write. > Possible reason: > {code} > (DUH2's commit()) > ... > 1: if (cmd.softCommit) { > 2:// ulog.preSoftCommit(); > 3:synchronized (solrCoreState.getUpdateLock()) { > 4: if (ulog != null) ulog.preSoftCommit(cmd); > 5: core.getSearcher(true, false, waitSearcher, true); > 6: if (ulog != null) ulog.postSoftCommit(cmd); > 7:} > 8:callPostSoftCommitCallbacks(); > 9: } > ... > {code} > * Before line 1, there was an update (say id=2) which was in ulog's map. Maps > are, say, `map={2=LogPtr(1234)}, prevMap={...}, prevMap2={...}` > * Due to line 4 (ulog.preSoftCommit()), the maps were rotated. Now, the id=2 > is in prevMap: `map={}, prevMap={2=LogPtr(1234)}, prevMap2={...}`. Till now > RTG for id=2 will work. > * Due to line 5, a new searcher is due to be opened. But this is > asynchronous, and lets assume this doesn't complete before few more lines are > executed. > * Due to line 6 (ulog.postSoftCommit()), the previous maps are cleared out. > Now the maps are: `map={}, prevMap=null, prevMap2=null` > * If there's an RTG for id=2, it will not work from the ulog's maps, so it > will fall through to be searched using the last searcher. But, the searcher > due to be opened in line 5 hasn't yet been opened. In this case, the returned > document will be whatever version of id=2 that was present in the previous > searcher. > Can someone please confirm if this is a potential problem? If so, any > suggestions for a fix, please? I tried opening a ulog.openRealtimeSearcher() > in the above synchronized block, but the problem still persists, but I > haven't looked into why that could be. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org