[
https://issues.apache.org/jira/browse/SOLR-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946570#comment-17946570
]
ASF subversion and git services commented on SOLR-17720:
--------------------------------------------------------
Commit 67a642fe0263588155627c0429ea5cf39f519c8e in solr's branch
refs/heads/main from aparnasuresh85
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=67a642fe026 ]
SOLR-17720: Fix rare deadlock in CollectionProperties (#3304)
The problem pre-dated CollectionPropertiesZkStateReader's existence.
> Deadlock in CollectionPropertiesZkStateReader
> ---------------------------------------------
>
> Key: SOLR-17720
> URL: https://issues.apache.org/jira/browse/SOLR-17720
> Project: Solr
> Issue Type: Bug
> Components: SolrJ
> Affects Versions: 9.7
> Reporter: Houston Putman
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 9.9
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> {{CollectionPropertiesZkStateReader}} has multiple different mechanisms for
> synchronizing when modifying its concurrent data structures.
> # {{synchronized (getCollectionLock(collection))}}
> # {{collectionPropsObservers}} is a ConcurrentHashMap, and therefore locks
> on updating a single key within the map.
> Unfortunately this can cause a deadlock.
> In {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}},
> {{collectionPropsObservers.compute(collection, <function>)}} is used which
> will create a lock in {{collectionPropsObservers}} on the {{collection}} key.
> Within this locked {{<function>}} command, {{synchronized
> (getCollectionLock(collection))}} is called.
> In {{CollectionPropertiesZkStateReader.refreshAndWatch()}}, {{synchronized
> (getCollectionLock(coll))}} is used for the whole method. And within this
> synchronized block, {{collectionPropsObservers.remove(coll)}} is called
> (which will obviously get a lock on the {{coll}} key for
> {{collectionPropsObservers}}.
> So {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}} has
> the lock for {{collectionPropsObservers}} but is waiting on the lock for
> {{getCollectionLock(coll)}}. And
> {{CollectionPropertiesZkStateReader.refreshAndWatch()}} has the lock for
> {{getCollectionLock(coll)}} and is waiting on the lock for
> {{collectionPropsObservers}}. Hence deadlock.
> This code is quite complex, and I think it can really be simplified, but
> that's just a gut reaction. I think moving the {{synchronized
> (getCollectionLock(collection))}} block in {{removeCollectionPropsWatcher()}}
> outside of the {{compute()}} call would solve this one deadlock though.
> Hopefully we can really simplify this with Curator though.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]