[ https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290166#comment-17290166 ]
Paweł Bugalski commented on LUCENE-9791: ---------------------------------------- {quote}I think it is a problem if we start saying that you can use it concurrently this way but not concurrently that way. Eventually somebody gets confused. Better to just draw a line in the sand and say this class is or is not thread safe and be done with it. Maybe another idea is to have a FrozenBytesRefHash that is explicitly safe for concurrent lookup because it does not support modification? {quote} I think I agree and disagree with you at the same time. To explain my point let me quote Uwe, who I think has a great point: {quote}(like a HashMap: If you have build a HashMap, you can concurerntly access the getters/contains - but never ever change it again) {quote} For me this analogy explains well the behaviour I would expect from BytesRefHash. And I think exactly this thinking got authors of Monitor caught in that trap. So what are the guarantees that HashMap is giving? Quoting from its javadoc: {quote} Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. {quote} So I disagree with you that it is a trap to have this a different properties for mutating and not mutating methods on a class and describe it on javadoc. Still such a class should be marked as not thread safe as essentially this is what it is. That said. If you read on the javadoc of HashMap: {quote} If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. {quote} You end up with a need for some kind of unmodifiable BytesRefHash interface exactly as you proposed. So I agree we should provide one if we want to be consistent with an expected behaviour (for java devs based on HashMap). What do you think? > Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find > ----------------------------------------------------------------------- > > Key: LUCENE-9791 > URL: https://issues.apache.org/jira/browse/LUCENE-9791 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: master (9.0), 8.7, 8.8 > Reporter: Paweł Bugalski > Priority: Major > Attachments: LUCENE-9791.patch > > Time Spent: 50m > Remaining Estimate: 0h > > _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that > should be matched by one of registered queries if match operations are run > concurrently from multiple threads. > This is because sometimes in a concurrent environment > _TermFilteredPresearcher_ might not select a query that could later on match > one of documents being matched. > Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of > _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is > correctly initialized under lock and its internal state (a map of > _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. > Later one when those instances are used concurrently a problem with > _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not > thread safe. > _org.apache.lucene.util.BytesRefHash#find_ internally is using a private > _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an > instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ > parameter with contents of _ByteBlockPool_. This is not thread safe and can > cause incorrect answers as well as _ArrayOutOfBoundException_. > __ > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org