[ 
https://issues.apache.org/jira/browse/SOLR-16412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624132#comment-17624132
 ] 

ASF subversion and git services commented on SOLR-16412:
--------------------------------------------------------

Commit 6340a4abba37d535831b74703a7ca390eff167b7 in solr's branch 
refs/heads/branch_9_1 from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=6340a4abba3 ]

SOLR-16412: Fix TestSizeLimitedDistributedMap LinkedList compilation error


> Race condition could trigger error on concurrent SizeLimitedDistributedMap 
> cleanup
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-16412
>                 URL: https://issues.apache.org/jira/browse/SOLR-16412
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.8, 9.1, main (10.0)
>            Reporter: Patson Luk
>            Assignee: Ishan Chattopadhyaya
>            Priority: Major
>             Fix For: 9.1, main (10.0)
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> h2. Description
> Exception below is observed while updating the `completedMap` field in 
> `OverseerTaskProcessor` :
> {{o.a.s.c.OverseerTaskProcessor 
> :org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /overseer/collection-map-completed/mn-736f6c726d616e2d312d31383930383730393837313333303932353331}}
> {{at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)}}
> {{at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)}}
> {{at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:2001)}}
> {{at 
> org.apache.solr.common.cloud.SolrZkClient.lambda$delete$1(SolrZkClient.java:264)}}
> {{at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)}}
> {{at org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:263)}}
> {{at 
> org.apache.solr.cloud.SizeLimitedDistributedMap.put(SizeLimitedDistributedMap.java:76)}}
> {{at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:538)}}
> {{at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)}}
> {{at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}}
> {{at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}}
> h2. Cause
> Based on the stack trace, `SizeLimitedDistributedMap` had reached the limit 
> and attempted to cleanup entries:
> [https://github.com/fullstorydev/lucene-solr/blob/75e89929eb360b513ee864aeb23a80c049747246/solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java#L73-L80]
> However, when it performs the actual deletion, it failed with 
> `NoNodeException`
> This is likely caused by race condition as multiple threads can enter the 
> same code block and try to delete same list of children which the slower 
> threads can delete on child node that no longer exists.
>  
> Such condition can be reproduced by unit test case, which will be included in 
> the PR
> h2. Solution
> Although we could enforce synchronization to prevent threads from purging the 
> same set of child nodes, it might not be desirable to add extra blocking.
> Instead, it's probably safe to ignore the `KeeperException.NoNodeException` 
> if such node is no longer there for the purge operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to