[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134957#comment-16134957 ] ASF subversion and git services commented on SOLR-10983: Commit f031a85f50902cfc0b54422b35f60effb7353b05 in lucene-solr's branch refs/heads/branch_6_6 from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f031a85 ] SOLR-10983: Fix DOWNNODE -> queue-work explosion > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Fix For: 7.0, 6.6.1, master (8.0) > > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117562#comment-16117562 ] Erick Erickson commented on SOLR-10983: --- I backported this to 6x (future 6.7) as I really expect there to be a final release of the 6x code line and didn't want this to be omitted. No harm if there's _not_ a 6.7. > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Fix For: 7.0, master (8.0), 7.1 > > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117560#comment-16117560 ] ASF subversion and git services commented on SOLR-10983: Commit d704796a785aa0d8e455661e519bb2f0c67b7311 in lucene-solr's branch refs/heads/branch_6x from Erick [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d704796 ] SOLR-10983: Fix DOWNNODE -> queue-work explosion, backporting to 6x as per the comments in the JIRA > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Fix For: 7.0, master (8.0), 7.1 > > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075666#comment-16075666 ] Scott Blum commented on SOLR-10983: --- BTW: this issue most likely affects all 6.x releases (and even some late 5.x), so it should be considered if we do any 6.x point releases later. > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Fix For: 7.0, master (8.0), 7.1 > > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075663#comment-16075663 ] ASF subversion and git services commented on SOLR-10983: Commit 17245c2e5a93bca59572c09af78a6ad6045e75eb in lucene-solr's branch refs/heads/branch_7x from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=17245c2 ] SOLR-10983: Fix DOWNNODE -> queue-work explosion > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075664#comment-16075664 ] ASF subversion and git services commented on SOLR-10983: Commit 51638c09bf4f5457650ab40c60b5f98512f9ca1d in lucene-solr's branch refs/heads/branch_7_0 from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=51638c0 ] SOLR-10983: Fix DOWNNODE -> queue-work explosion > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075662#comment-16075662 ] ASF subversion and git services commented on SOLR-10983: Commit 380eed838d6646ec02592a9d2e6649e6aa1b5d9b in lucene-solr's branch refs/heads/master from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=380eed8 ] SOLR-10983: Fix DOWNNODE -> queue-work explosion > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074225#comment-16074225 ] Scott Blum commented on SOLR-10983: --- Thanks! Will do > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074176#comment-16074176 ] Shalin Shekhar Mangar commented on SOLR-10983: -- On second thought, creating a batch enqueue command is not so straightforward and the callback is called once per enqueue as per the contract of ZkWriteCallback so it is technically not a bug. So I am fine with your solution as it exists. +1 to commit. Please make sure it is backported to the branch_7x and branch_7_0 so that it makes it into the 7.0 release. > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074173#comment-16074173 ] Shalin Shekhar Mangar commented on SOLR-10983: -- Nice catch! Your patch solves another problem -- today if an exception happens, we run through items in the work-queue and the last item from state-update-queue (the one during which the exception happened) so we run the same item twice. Considering that DOWNNODE is the only command that enqueues multiple ZkWriteCommands, I think we should add a method to ZkStateWriter which calls enqueue only once for the entire batch. That and your patch solve all problems nicely i.e. # DOWNNODE creating multiple work queue items # Exceptions not clearing work queue # Overseer executing same item twice from work queue and state update queue on an exception > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10983) Fix DOWNNODE -> queue-work explosion
[ https://issues.apache.org/jira/browse/SOLR-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069503#comment-16069503 ] Scott Blum commented on SOLR-10983: --- [~shalinmangar] [~jhump] > Fix DOWNNODE -> queue-work explosion > > > Key: SOLR-10983 > URL: https://issues.apache.org/jira/browse/SOLR-10983 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Scott Blum >Assignee: Scott Blum > Attachments: SOLR-10983.patch > > > Every DOWNNODE command enqueues N copies of itself into queue-work, where N > is number of collections affected by the DOWNNODE. > This rarely matters in practice, because queue-work gets immediately dumped-- > however, if anything throws an exception (such as ZK bad version), we don't > clear queue-work. Then the next time through the loop we run the expensive > DOWNNODE command potentially hundreds of times. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org