[ https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529139#comment-17529139 ]
Chris M. Hostetter commented on LUCENE-10292: --------------------------------------------- {quote}... how did you arrive at the conclusion that Lookup.build can be called concurrently? {quote} Well, my initial understand/impression was that calling lookup concurrent to a (re)build should be "ok" was based on the fact that it worked for every Suggester I could find _except_ AnalyzingInfixSuggester because all other suggesters atomically replaced their internal structures (typically FSTs) at the end of their build() process – only AnalyzingInfixSuggester blew away it's internal data (it's searcherMgr) at the start of the build method, replacing it only at the end of the build – and had a lookup method that was throw an exception if you tried to use it during a (re)build. As i said in my initial comment, it seemed like at a minimum we could/should make AnalyzingInfixSuggester return Collections.emptyList() in the case that searcherMgr was null (consistent with the other Lookup impls i found and what their lookup methods returned if they had _never_ been built) but it seemed very easy/possible to make AnalyzingInfixSuggester support lookup concurrent w/ (re)build – so i added it (since there were no objections) As i mentioned in subsequent comments (above) – while working on the test for this (and refactoring it so that it could be applied to all suggesters) i found that while i couldn't trigger any failures like this in other Lookup impls during concurrent calls to build()/lookup() i did discover some FST based suggesters (like AnalysingSuggester and FSTCompletionLookup) had inconsistencies between the getCount() and lookup() since the "count" variable was being updated iteratively while the fst was being replaced atomicly –which i only noticed because my new test changes were also sanity checking the count to confirm that the "new" data was in fact getting picked up by the time build finished. I attempted to "improve" those inconsistencies as well, in the two analyzers where i noticed it, by replacing the "count" variable atomically as well .... _but I evidently overlooked that FreeTextSuggester has this same code pattern_ . (And yes, my "improvement" isn't a perfect "fix" because the fst & count variables are updated independently w/o any synchronization – but it didn't seem worth the overhead of adding that since there's nothing in the docs that implies/guarantees anything about what getCount will return during a build – but it certainly seemed wrong to me that those impls would happily return lots of results from lookup calls while getCount() might return '0') {quote}...while the build() method is not even invoked yet because this line: ... is semaphore-blocked in the constructor parameters (InputArrayIterator). {quote} ...ah, yes, i overlooked that. The spirit of what I was trying to do with this test is assert that the various "checks" all pass even while the build method is in the process of consuming an iterator. I initially implemented the test with both the Semaphore and sleep calls, but didn't want to leave them in and make the test "slow" – and when removing the sleep calls I clearly didn't give enough thought to the possibility that the "main" thread would have a chance to release all it's permits before the background thread had acquired any of them. ---- I've got an improved version of the test locally that uses bi-directional Semaphores to ensure the checks are tested between every call to next() (and wraps the InputArrayIterator instead of vice-versa so it doesn't get hung up on the behavior of that classes constructor) which reliably triggers the current sporadic jenkins failures in TestFreeTextSuggester – and making the same improvements to how that FreeTextSuggester tracks its "count" that my previous commits made to AnalysingSuggester and FSTCompletionLookup causes the modified test to reliably pass. I'll commit these changes to main & 9x ASAP to stop the jenkins failures – but if you would like to re-open this issue for further discussion (and/or revert all of these commits in the meantime) I understand > AnalyzingInfixSuggester thread safety: lookup() fails during (re)build() > ------------------------------------------------------------------------ > > Key: LUCENE-10292 > URL: https://issues.apache.org/jira/browse/LUCENE-10292 > Project: Lucene - Core > Issue Type: Bug > Reporter: Chris M. Hostetter > Assignee: Chris M. Hostetter > Priority: Major > Fix For: 10.0 (main), 9.2 > > Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, > LUCENE-10292-3.patch, LUCENE-10292.patch > > > I'm filing this based on anecdotal information from a Solr user w/o > experiencing it first hand (and I don't have a test case to demonstrate it) > but based on a reading of the code the underlying problem seems self > evident... > With all other Lookup implementations I've examined, it is possible to call > {{lookup()}} regardless of whether another thread is concurrently calling > {{build()}} – in all cases I've seen, it is even possible to call > {{lookup()}} even if {{build()}} has never been called: the result is just an > "empty" {{List<LookupResult>}} > Typically this is works because the {{build()}} method uses temporary > datastructures until it's "build logic" is complete, at which point it > atomically replaces the datastructures used by the {{lookup()}} method. In > the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method > starts by closing & null'ing out the {{protected SearcherManager > searcherMgr}} (which it only populates again once it's completed building up > it's index) and then the lookup method starts with... > {code:java} > if (searcherMgr == null) { > throw new IllegalStateException("suggester was not built"); > } > {code} > ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any > situation where another thread may be calling > {{AnalyzingInfixSuggester.build()}} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org