[ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529139#comment-17529139
 ] 

Chris M. Hostetter commented on LUCENE-10292:
---------------------------------------------

{quote}... how did you arrive at the conclusion that Lookup.build can be called 
concurrently?
{quote}
Well, my initial understand/impression was that calling lookup concurrent to a 
(re)build should be "ok" was based on the fact that it worked for every 
Suggester I could find _except_ AnalyzingInfixSuggester because all other 
suggesters atomically replaced their internal structures (typically FSTs) at 
the end of their build() process – only AnalyzingInfixSuggester blew away it's 
internal data (it's searcherMgr) at the start of the build method, replacing it 
only at the end of the build – and had a lookup method that was throw an 
exception if you tried to use it during a (re)build.

As i said in my initial comment, it seemed like at a minimum we could/should 
make AnalyzingInfixSuggester return Collections.emptyList() in the case that 
searcherMgr was null (consistent with the other Lookup impls i found and what 
their lookup methods returned if they had _never_ been built) but it seemed 
very easy/possible to make AnalyzingInfixSuggester support lookup concurrent w/ 
(re)build – so i added it (since there were no objections)

As i mentioned in subsequent comments (above) – while working on the test for 
this (and refactoring it so that it could be applied to all suggesters) i found 
that while i couldn't trigger any failures like this in other Lookup impls 
during concurrent calls to build()/lookup() i did discover some FST based 
suggesters (like AnalysingSuggester and FSTCompletionLookup) had 
inconsistencies between the getCount() and lookup() since the "count" variable 
was being updated iteratively while the fst was being replaced atomicly –which 
i only noticed because my new test changes were also sanity checking the count 
to confirm that the "new" data was in fact getting picked up by the time build 
finished.

I attempted to "improve" those inconsistencies as well, in the two analyzers 
where i noticed it, by replacing the "count" variable atomically as well .... 
_but I evidently overlooked that FreeTextSuggester has this same code pattern_ .

(And yes, my "improvement" isn't a perfect "fix" because the fst & count 
variables are updated independently w/o any synchronization – but it didn't 
seem worth the overhead of adding that since there's nothing in the docs that 
implies/guarantees anything about what getCount will return during a build – 
but it certainly seemed wrong to me that those impls would happily return lots 
of results from lookup calls while getCount() might return '0')
{quote}...while the build() method is not even invoked yet because this line: 
... is semaphore-blocked in the constructor parameters (InputArrayIterator).
{quote}
...ah, yes, i overlooked that.

The spirit of what I was trying to do with this test is assert that the various 
"checks" all pass even while the build method is in the process of consuming an 
iterator. I initially implemented the test with both the Semaphore and sleep 
calls, but didn't want to leave them in and make the test "slow" – and when 
removing the sleep calls I clearly didn't give enough thought to the 
possibility that the "main" thread would have a chance to release all it's 
permits before the background thread had acquired any of them.
----
I've got an improved version of the test locally that uses bi-directional 
Semaphores to ensure the checks are tested between every call to next() (and 
wraps the InputArrayIterator instead of vice-versa so it doesn't get hung up on 
the behavior of that classes constructor) which reliably triggers the current 
sporadic jenkins failures in TestFreeTextSuggester – and making the same 
improvements to how that FreeTextSuggester tracks its "count" that my previous 
commits made to AnalysingSuggester and FSTCompletionLookup causes the modified 
test to reliably pass.

I'll commit these changes to main & 9x ASAP to stop the jenkins failures – but 
if you would like to re-open this issue for further discussion (and/or revert 
all of these commits in the meantime) I understand

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-10292
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10292
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>             Fix For: 10.0 (main), 9.2
>
>         Attachments: LUCENE-10292-1.patch, LUCENE-10292-2.patch, 
> LUCENE-10292-3.patch, LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List<LookupResult>}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
>     if (searcherMgr == null) {
>       throw new IllegalStateException("suggester was not built");
>     }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to