[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974402#comment-15974402 ] Markus Jelsma commented on SOLR-10420: -- Thanks! Great work! > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973453#comment-15973453 ] ASF subversion and git services commented on SOLR-10420: Commit 89beee8d61346d50dbbf02f0cc9cfc5032e46eee in lucene-solr's branch refs/heads/branch_5_5 from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89beee8 ] SOLR-10420: fix watcher leak in DistributedQueue > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973382#comment-15973382 ] ASF subversion and git services commented on SOLR-10420: Commit e3beb61a72efbce37710ce3cc48b24093070d052 in lucene-solr's branch refs/heads/branch_6_5 from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3beb61 ] SOLR-10420: fix watcher leak in DistributedQueue > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973354#comment-15973354 ] ASF subversion and git services commented on SOLR-10420: Commit 42d08dd28c6609a2c70a691e6a88725c9aa31377 in lucene-solr's branch refs/heads/branch_5x from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=42d08dd ] SOLR-10420: fix watcher leak in DistributedQueue > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973351#comment-15973351 ] ASF subversion and git services commented on SOLR-10420: Commit ae55dfc10fd3843d35df9096b7626aad36735670 in lucene-solr's branch refs/heads/branch_6x from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ae55dfc ] SOLR-10420: fix watcher leak in DistributedQueue > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973346#comment-15973346 ] Scott Blum commented on SOLR-10420: --- Got it. I do think it would be a mistake. In that case, after I've committed to 5x and 6x, I'll also commit to 6_5 and 5_5. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 5.6, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973331#comment-15973331 ] Ishan Chattopadhyaya commented on SOLR-10420: - I think you should do master, branch_6x and branch_6_5. branch_5x is optional. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 6.4.3, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973330#comment-15973330 ] Steve Rowe commented on SOLR-10420: --- bq. I think at a minimum you should also commit to branch_6_5, since it's a known release, since you'll be better equipped to handle conflicts if there are any. I noticed you set fixVersion to releases on branches you won't be committing to: 6.4.3, 5.5.5. I don't think you should do that. If you're going to commit to branch_5x, then the fixVersion would be 5.6, since that's the next release on that branch. Unless you commit to branch_6_4, you shouldn't include 6.4.3 as a fixVersion. More generally, if you think it's essential that any X.Y.Z release includes this fix, i.e. that it would be a mistake to release without it, then you should commit to the branch from which that release will be made. Otherwise you and others may/will forget to backport when such a release materializes. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 6.4.3, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973320#comment-15973320 ] Scott Blum commented on SOLR-10420: --- So my plan is to commit this to master, branch_6x, and branch_5x, and let the release managers pull it into the actual release branches. SG? > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 6.4.3, 6.5.1, 6.6, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973261#comment-15973261 ] ASF subversion and git services commented on SOLR-10420: Commit 43c2b2320dcf344c42086ceb782e0fc53c439952 in lucene-solr's branch refs/heads/master from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=43c2b23 ] SOLR-10420: fix watcher leak in DistributedQueue > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Fix For: 5.5.5, 6.4.3, 6.5.1, master (7.0) > > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973246#comment-15973246 ] Scott Blum commented on SOLR-10420: --- Great! Thanks [~steve_rowe]! I'll get this committed. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973207#comment-15973207 ] Steve Rowe commented on SOLR-10420: --- bq. I'm running the full Solr core test suite a couple times now, I'll report back when it finishes (should be less than half an hour). I ran the solr-core and solrj suites three times each with [~dragonsinth]'s latest patch on master, and there were zero failures. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973118#comment-15973118 ] Steve Rowe commented on SOLR-10420: --- I did 500 beasting iterations of OverseerTest using your latest patch [~dragonsinth], zero failures. I'm running the full Solr core test suite a couple times now, I'll report back when it finishes (should be less than half an hour). > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973093#comment-15973093 ] Scott Blum commented on SOLR-10420: --- Agreed. It passes for me. Anyone on this issue want to do any extensive testing before I commit? Otherwise I'll commit this today to master and then start backporting it to a number of branches. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973070#comment-15973070 ] Cao Manh Dat commented on SOLR-10420: - [~dragonsinth] This issue is blocker for Solr 6.5.1. So I think we (you) should commit soon. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971964#comment-15971964 ] Cao Manh Dat commented on SOLR-10420: - [~dragonsinth] The patch LGTM! > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma >Assignee: Scott Blum > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, > SOLR-10420-dragonsinth.patch, SOLR-10420.patch, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971259#comment-15971259 ] Scott Blum commented on SOLR-10420: --- [~caomanhdat] I didn't literally mean that we should bring back the isDirty bit. I meant that clearly the last time around, there was a hole in the design that led to this leak. I want to take the opportunity to re-look at the design again as a whole and make sure everything seems good, and we're not just putting a band-aid on it. You may have already done this, so just give me a little bit to catch up. :D > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970215#comment-15970215 ] Cao Manh Dat commented on SOLR-10420: - [~dragonsinth] As Steve said, Overseer.external.. test failed frequently before SOLR-9191 got committed. So I doubt that "isDirty" in retro-code will fix the problem. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970205#comment-15970205 ] Cao Manh Dat commented on SOLR-10420: - [~dragonsinth] that's correct. The last patch pass all the tests. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970030#comment-15970030 ] Scott Blum commented on SOLR-10420: --- Let me try to unpack what you said.. 1) We want a synchronous offer() -> peek() on the same thread to return the item offered without delay. 2) This works on master, but the original patch to fix the leak breaks #1. Is that correct? Let me look at this on Monday with [~jhump]. I'm pretty sure there's a simplification to be made in DQ with how we're handling the watcher and dirty tracking. There used to be an explicit "isDirty" bit that we traded out for watcher nullability, which in retrospect I'm not sure was the best choice. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969966#comment-15969966 ] Cao Manh Dat commented on SOLR-10420: - [~dragonsinth] Can you review the patch? > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969843#comment-15969843 ] Cao Manh Dat commented on SOLR-10420: - I'm thinking about two different way to solve this problem : 1. DQ will set lastWatcher = null when we run {{DQ.offer(byte[] data)}} sucessfully. Because the overseer.workqueue is locally offered by overseer so that will fix the problem. But we changed {{DQ.peek()}} from true positive ( if ZK contain new item, we will return that ) to false positive ( if ZK contain new item, we may not return that ) so this may inflict other parts as well. 2. each time DQ.peek() is called we will look at the ZK nodes without using the watcher. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969839#comment-15969839 ] Cao Manh Dat commented on SOLR-10420: - I kinda find out the reason why the test failure. There are some notice here - In current DQ version, for each time we peek() if in-memory queue is empty, we will actually look at the ZK to get new elements ( watcher are useless in this scenario ) - With the patch, for each time we peek() if in-memory queue is empty, we will only look at the ZK nodes when watcher tell us that there are change in our queue. So this is the reason why the test failure - overseer.queue <- set a replica down - overseer run the command successfully - overseer.queue <- set a replica active - overseer delay this command ( overseer.workqueue <- set a replica active ) - touch /clusterstate.json to change its version - overseer.queue <- some ZKWriteCommand, let's call this one ZK1 - overseer change the clusterstate to set replica active - overseer meet badversion exception - overseer fetch last element from overseer.workqueue. Here are where problem happen, overseer.workqueue.peek() return empty because the watcher is not fired. - overseer process ZK1, it success -> overseer.workqueue is emptied. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.4.2, 6.5 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968869#comment-15968869 ] Cao Manh Dat commented on SOLR-10420: - It's actually fix the problem even without reusing the same object. But It makes the OverseerTest fail. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.5, 6.4.2 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968618#comment-15968618 ] Scott Blum commented on SOLR-10420: --- Fix LGTM. Is this actual fix this? ``` // we're not in a dirty state, and we do not have in-memory children if (lastWatcher != null) return null; ``` IE, if you just do that, would that fix the leak even without reusing the same object? > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.5, 6.4.2 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, OverseerTest.DEBUG.43.stdout, > OverseerTest.DEBUG.48.stdout, OverseerTest.DEBUG.58.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967470#comment-15967470 ] Steve Rowe commented on SOLR-10420: --- bq. Next I'll try the patch with the OverseerTest changes. I got 4 failures from 100 beasting iterations using this version of the patch. The failures looked the same as the previously posted logs, so I won't add them. Next I'll try beasting the latest patch again, this time with DEBUG level on {{org.apache.solr.common.cloud}}. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.5, 6.4.2 >Reporter: Markus Jelsma > Attachments: OverseerTest.106.stdout, OverseerTest.119.stdout, > OverseerTest.80.stdout, SOLR-10420.patch, SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967158#comment-15967158 ] Ishan Chattopadhyaya commented on SOLR-10420: - Ran the test with original patch as well as 15s timeout patch 500 times each. I saw no failures. I can run this on better hardware early next week (my AMD Ryzen is arriving soon!). > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.2, 6.5, 6.4.2 >Reporter: Markus Jelsma > Attachments: OverseerTest.80.stdout, SOLR-10420.patch, > SOLR-10420.patch, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965790#comment-15965790 ] Markus Jelsma commented on SOLR-10420: -- This patch appears to solve the problem. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > Attachments: SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955575#comment-15955575 ] Markus Jelsma commented on SOLR-10420: -- So it seems. Forced GC does not remove the object instances in >= 6.1.0. In 6.0.x regular GC and forced GC does remove the instances from the object count. I think almost everyone should be able to see it for themselves, almost all our Solr instances show this problem immediately after restart, some don't in some occasions. Although they don't consume a lot of bytes, the problem appears to cause more CPU time being used up. Filtering the memory sampler for org.apache.solr.common reveals it right away. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955538#comment-15955538 ] Walter Underwood commented on SOLR-10420: - To be clear, these are uncollectable objects? > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955293#comment-15955293 ] Markus Jelsma commented on SOLR-10420: -- To note another oddity, some nodes of our regular search cluster (6.5.0) do not show increased counts. Some nodes with other roles (but running Solr) show the problem immediately after each restart every time i restarted them today. So it could be 6.0.1 and 6.0.0 also show the problem, although they didn't when i just tested them. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955287#comment-15955287 ] Markus Jelsma commented on SOLR-10420: -- Well, actually DistributedQueue$ChildWatcher is being leaked well, so leaking of SolrZkClient could be a consequence of that. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955283#comment-15955283 ] Scott Blum commented on SOLR-10420: --- Hard to see how the problem could be localized to DistributedQueue$ChildWatcher.. it doesn't create any ZkClients, it's passed in from the outside. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955282#comment-15955282 ] Markus Jelsma commented on SOLR-10420: -- Ah, i found it, the problem appeared in 6.1.0. Versions 6.0.0 and 6.0.1 do not show this problem, the instances are eaten by GC. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955259#comment-15955259 ] Ishan Chattopadhyaya commented on SOLR-10420: - The ant resolve could hang due to lock files. You could try this: {{find ~ -name "*lck" | xargs rm}}. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955253#comment-15955253 ] Markus Jelsma commented on SOLR-10420: -- I only have 6.5.0 and a not-yet upgraded 6.4.2, both suffer the same. But i just built a 6.3.0, ran it in cloud mode without registering a collection or core using the built-in Zookeeper. After two minutes, i had ~120 client objects, now i have more. 6.0.0 doesn't show increased instance counts. Can't test 6.1 and 6.2, ant keeps hanging on resolve for whatever reason. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10420) Solr 6.x leaking one SolrZkClient instance per second
[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955222#comment-15955222 ] Ishan Chattopadhyaya commented on SOLR-10420: - Nothing changed in that code in the last few releases. Do you know if this worked fine in a prior 6x release? FYI, [~dragonsinh] and [~shalinmangar] <-- experts in that code. > Solr 6.x leaking one SolrZkClient instance per second > - > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.5, 6.4.2 >Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org