[ 
https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969839#comment-15969839
 ] 

Cao Manh Dat commented on SOLR-10420:
-------------------------------------

I kinda find out the reason why the test failure. There are some notice here
- In current DQ version, for each time we peek() if in-memory queue is empty, 
we will actually look at the ZK to get new elements ( watcher are useless in 
this scenario )
- With the patch, for each time we peek()  if in-memory queue is empty, we will 
only look at the ZK nodes when watcher tell us that there are change in our 
queue.

So this is the reason why the test failure
- overseer.queue <- set a replica down
- overseer run the command successfully
- overseer.queue <- set a replica active
- overseer delay this command ( overseer.workqueue <- set a replica active )
- touch /clusterstate.json to change its version
- overseer.queue <- some ZKWriteCommand, let's call this one ZK1
- overseer change the clusterstate to set replica active
- overseer meet badversion exception
- overseer fetch last element from overseer.workqueue. Here are where problem 
happen, overseer.workqueue.peek() return empty because the watcher is not fired.
- overseer process ZK1, it success -> overseer.workqueue is emptied.

> Solr 6.x leaking one SolrZkClient instance per second
> -----------------------------------------------------
>
>                 Key: SOLR-10420
>                 URL: https://issues.apache.org/jira/browse/SOLR-10420
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5.2, 6.4.2, 6.5
>            Reporter: Markus Jelsma
>         Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, 
> OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, 
> OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, 
> SOLR-10420.patch, SOLR-10420.patch
>
>
> One of our nodes became berzerk after a restart, Solr went completely nuts! 
> So i opened VisualVM to keep an eye on it and spotted a different problem 
> that occurs in all our Solr 6.4.2 and 6.5.0 nodes.
> It appears Solr is leaking one SolrZkClient instance per second via 
> DistributedQueue$ChildWatcher. That one per second is quite accurate for all 
> nodes, there are about the same amount of instances as there are seconds 
> since Solr started. I know VisualVM's instance count includes 
> objects-to-be-collected, the instance count does not drop after a forced 
> garbed collection round.
> It doesn't matter how many cores or collections the nodes carry or how heavy 
> traffic is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to