[ 
https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290166#comment-17290166
 ] 

Paweł Bugalski commented on LUCENE-9791:
----------------------------------------

{quote}I think it is a problem if we start saying that you can use it 
concurrently this way but not concurrently that way. Eventually somebody gets 
confused. Better to just draw a line in the sand and say this class is or is 
not thread safe and be done with it.

Maybe another idea is to have a FrozenBytesRefHash that is explicitly safe for 
concurrent lookup because it does not support modification?
{quote}
I think I agree and disagree with you at the same time.

To explain my point let me quote Uwe, who I think has a great point:
{quote}(like a HashMap: If you have build a HashMap, you can concurerntly 
access the getters/contains - but never ever change it again)
{quote}

For me this analogy explains well the behaviour I would expect from 
BytesRefHash. And I think exactly this thinking got authors of Monitor caught 
in that trap.

So what are the guarantees that HashMap is giving? Quoting from its javadoc:
{quote}
Note that this implementation is not synchronized. If multiple threads access a 
hash map concurrently, and at least one of the threads modifies the map 
structurally, it must be synchronized externally. (A structural modification is 
any operation that adds or deletes one or more mappings; merely changing the 
value associated with a key that an instance already contains is not a 
structural modification.) This is typically accomplished by synchronizing on 
some object that naturally encapsulates the map.
{quote}

So I disagree with you that it is a trap to have this a different properties 
for mutating and not mutating methods on a class and describe it on javadoc. 
Still such a class should be marked as not thread safe as essentially this is 
what it is.

That said. If you read on the javadoc of HashMap: 
{quote}
If no such object exists, the map should be "wrapped" using the 
Collections.synchronizedMap method.
{quote} 
You end up with a need for some kind of unmodifiable BytesRefHash interface 
exactly as you proposed. So I agree we should provide one if we want to be 
consistent with an expected behaviour (for java devs based on HashMap).
What do you think?

> Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-9791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: master (9.0), 8.7, 8.8
>            Reporter: Paweł Bugalski
>            Priority: Major
>         Attachments: LUCENE-9791.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that 
> should be matched by one of registered queries if match operations are run 
> concurrently from multiple threads. 
> This is because sometimes in a concurrent environment 
> _TermFilteredPresearcher_ might not select a query that could later on match 
> one of documents being matched.
> Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of 
> _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is 
> correctly initialized under lock and its internal state (a map of 
> _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. 
> Later one when those instances are used concurrently a problem with 
> _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not 
> thread safe.
> _org.apache.lucene.util.BytesRefHash#find_ internally is using a private 
> _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an 
> instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ 
> parameter with contents of _ByteBlockPool_. This is not thread safe and can 
> cause incorrect answers as well as _ArrayOutOfBoundException_. 
> __
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to