[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-09 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284975#comment-16284975
 ] 

Noble Paul commented on SOLR-11423:
---

Sorry for the late response
+1

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
> Fix For: 7.2, master (8.0)
>
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284120#comment-16284120
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit 3a7f1071644ffe11ee74c96cfd4946204b6544b5 in lucene-solr's branch 
refs/heads/master from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3a7f107 ]

SOLR-11423: fix solr/CHANGES.txt, fixed in 7.2 not 7.1


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284118#comment-16284118
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit 576b0b5d659527a61ebdd80347c6fa14dc0a086f in lucene-solr's branch 
refs/heads/branch_7_2 from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=576b0b5 ]

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients respect


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284072#comment-16284072
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit 311105f1b0dad7f20ff6cd55be1d2eb9cd4246d6 in lucene-solr's branch 
refs/heads/branch_7x from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=311105f ]

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients respect


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284021#comment-16284021
 ] 

Scott Blum commented on SOLR-11423:
---

I'm fixing solr/CHANGES.txt while cherry-picking into 7x and 7_2.  I'll push a 
change to master to move it when that's done.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283981#comment-16283981
 ] 

Tomás Fernández Löbbe commented on SOLR-11423:
--

bq. Sounds good to me. So backport to branch_7_2 and branch_7x?
+1. And lets fix CHANGES.txt

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283972#comment-16283972
 ] 

Scott Blum commented on SOLR-11423:
---

Sounds good to me.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283926#comment-16283926
 ] 

Varun Thacker commented on SOLR-11423:
--

+1 to what Tomas suggested

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283908#comment-16283908
 ] 

Tomás Fernández Löbbe commented on SOLR-11423:
--

In my experience, 20k state updates in the queue is beyond the point of no 
return too. I think we should backport as it is now, and if needed in the 
future we can use a cluster property to address Scott's point. Lets get this to 
7.2

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-07 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282711#comment-16282711
 ] 

Scott Blum commented on SOLR-11423:
---

I didn't resolve due to Noble's #comment-16203208 but I have no objection to 
resolving.  I dropped it on master because I wasn't sure what branches we'd 
want to backport to.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-07 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282306#comment-16282306
 ] 

Adrien Grand commented on SOLR-11423:
-

The situation is quite weird as this change has been pushed to master only but 
has a CHANGES entry under 7.1.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-07 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282305#comment-16282305
 ] 

Adrien Grand commented on SOLR-11423:
-

[~dragonsinth] Is this issue good to resolve? I see commits have been pushed 
but it is still open?

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236706#comment-16236706
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit 0637407ea4bf3a49ed5bdabadcfee650a8e0a200 in lucene-solr's branch 
refs/heads/master from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0637407 ]

SOLR-11423: fix typo


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>Priority: Major
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-11-02 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236685#comment-16236685
 ] 

Scott Blum commented on SOLR-11423:
---

Oh wow, nice catch.  How'd I mess that one up?

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>Priority: Major
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-11-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236399#comment-16236399
 ] 

Tomás Fernández Löbbe commented on SOLR-11423:
--

Thanks for adding this [~dragonsinth]. One question
{code:java}
// Allow this client to push up to 1% of the remaining queue capacity without 
rechecking.
offerPermits.set(maxQueueSize - stat.getNumChildren() / 100);
{code}
Don't you want {{offerPermits.set((maxQueueSize - stat.getNumChildren()) / 
100);}}, or maybe {{offerPermits.set(remainingCapacity / 100);}}

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>Priority: Major
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-16 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206976#comment-16206976
 ] 

Cao Manh Dat commented on SOLR-11423:
-

Should we modify the behavior here a little bit, instead of throw an 
IllegalStateException() (which can lead to many errors cause this is an 
unchecked exception) should we try first, if the queue is full, retry until 
timeout.
[~dragonsinth] I really want to hear about your cluster status after SOLR-11443 
get applied.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-14 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204874#comment-16204874
 ] 

Scott Blum commented on SOLR-11423:
---

Is it possible to pick a number related to ZK's inherent limits?  That's really 
the goal here, prevent Solr from destroying ZK.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204860#comment-16204860
 ] 

Noble Paul commented on SOLR-11423:
---

I understand that it's not fool proof. However, when a user has hit a limit and 
he has no way of changing it other than downgrading to an older solr is worse. 
If we are on customer support this is a nightmare

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-13 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204004#comment-16204004
 ] 

Scott Blum commented on SOLR-11423:
---

I'm happy to defer on this issue, but I just want to be clear that I actively 
dislike having a system property.  It feels like a not useful piece of config, 
and worse, what happens if you don't set the same cap on every node?  Remember 
this is enforced client side, not server side.  If you accidentally have a mix 
of nodes where half of them cap at 20k, and half of them cap at 40k, then the 
moment you get above 20k any badly behaving 40k nodes are going to starve out 
the 20k nodes.  It becomes unfair contention.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-13 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203208#comment-16203208
 ] 

Noble Paul commented on SOLR-11423:
---

there must be a system property to change the value before we can mark this as 
resolved

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-13 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203192#comment-16203192
 ] 

Cao Manh Dat commented on SOLR-11423:
-

[~dragonsinth] Should this ticket be backported to branch_7x?

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-06 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194900#comment-16194900
 ] 

Scott Blum commented on SOLR-11423:
---

Perhaps you're right.  In our cluster though, Overseer is not able to chew 
through 20k entries very fast; it takes a long time (many minutes) to get 
through that number of items.  Our current escape valve is a tiny separate tool 
that just literally just goes through the queue and deletes everything. :D

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193846#comment-16193846
 ] 

Noble Paul commented on SOLR-11423:
---

Waiting can help. If every node errors out immediately after the cut off the 
items in queue can get cleared immediately and there will be  starvation . If 
we wait and retry after sometime , we can ensure that the events are generated 
at a lower rate. It helps in cases where a GC pause caused a pile up. 

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193719#comment-16193719
 ] 

ASF GitHub Bot commented on SOLR-11423:
---

Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/257


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193440#comment-16193440
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit 9419366a8e68a599efa2d8a230a0158d1763d10d in lucene-solr's branch 
refs/heads/master from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9419366 ]

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients respect


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193435#comment-16193435
 ] 

Scott Blum commented on SOLR-11423:
---

[~noble.paul] both good questions!

> This looks fine. I'm just worried about the extra cost of the extra cost of 
> {{Stat stat = zookeeper.exists(dir, null, true);}} for each call.

Indeed!  That's why in my second pass, I added `offerPermits`.  As long as the 
queue is mostly empty, each client will only bother checking the stats about 
every 200 queued entries.  Agreed on a "create sequential node if parent node 
has fewer then XXX entries".

> Another point to consider is , is the rest of Solr code designed to handle 
> this error? I guess, the thread should wait for a few seconds and retry if 
> the no: of items fell below the limit. This will dramatically reduce the no: 
> of other errors in the system.

I thought about this point quite a bit, and came down on the side of erroring 
immediately.  Again, I'm thinking of this mostly like an automatic emergency 
shutoff in a nuclear reactor: you hope you never need it.  The point being, if 
you're in a state where you have 20k items in the queue, you're already in a 
pathologically bad state.  I can't see how adding latency and hoping things get 
better will improve the situation vs. erroring out immediately.  I've never 
seen a solr cluster recover on its own once the queue got that high, it always 
required manual intervention.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192769#comment-16192769
 ] 

Noble Paul commented on SOLR-11423:
---

Another point to consider is , is the rest of Solr code designed to handle this 
error? I guess, the thread should wait for a few seconds and retry if the no: 
of items fell below the limit. This will dramatically reduce the no:of other 
errors in the system.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192660#comment-16192660
 ] 

Noble Paul commented on SOLR-11423:
---

This looks fine. I'm just worried about the extra cost of the extra cost of 
{{Stat stat = zookeeper.exists(dir, null, true);}} for each call.

I wish we ZK had a conditional set operation . 
Anyway +1

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-04 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191962#comment-16191962
 ] 

Scott Blum commented on SOLR-11423:
---

[~shalinmangar] [~noble.paul] either of you want to take a look at this change 
and +1 or -1?

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189037#comment-16189037
 ] 

ASF GitHub Bot commented on SOLR-11423:
---

GitHub user dragonsinth opened a pull request:

https://github.com/apache/lucene-solr/pull/257

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients 
respect



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/lucene-solr jira/SOLR-11423

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #257


commit ef8e0934fb27530f0c9450b58872b2b11028f50a
Author: Scott Blum 
Date:   2017-10-02T20:50:57Z

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients 
respect




> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188949#comment-16188949
 ] 

ASF GitHub Bot commented on SOLR-11423:
---

Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/256


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1617#comment-1617
 ] 

Scott Blum commented on SOLR-11423:
---

Patch passes tests for me.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188867#comment-16188867
 ] 

Scott Blum commented on SOLR-11423:
---

[~noble.paul] I went with 20k.  We could make it configurable, but again, I 
intend this to be a general purpose safety valve to protect Zookeeper from 
exploding, which makes me think we could pick a universal value.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188866#comment-16188866
 ] 

ASF GitHub Bot commented on SOLR-11423:
---

GitHub user dragonsinth opened a pull request:

https://github.com/apache/lucene-solr/pull/256

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients 
respect



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/lucene-solr SOLR-11423

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #256


commit ef8e0934fb27530f0c9450b58872b2b11028f50a
Author: Scott Blum 
Date:   2017-10-02T20:50:57Z

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients 
respect




> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188807#comment-16188807
 ] 

ASF subversion and git services commented on SOLR-11423:


Commit ef8e0934fb27530f0c9450b58872b2b11028f50a in lucene-solr's branch 
refs/heads/SOLR-11423 from [~dragonsinth]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef8e093 ]

SOLR-11423: Overseer queue needs a hard cap (maximum size) that clients respect


> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-02 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188803#comment-16188803
 ] 

Scott Blum commented on SOLR-11423:
---

[~erickerickson] I have backported changes to reduce overseer task counts.  
This isn't an issue with normal operation.  Think of this more as an automatic 
safety shutoff on a nuclear reactor.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-09-30 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187121#comment-16187121
 ] 

Erick Erickson commented on SOLR-11423:
---

What version of Solr? There has been some work done recently to reduce the 
number of tasks the Overseer needs to process and I'm wondering if your 
observations include those changes or not.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-09-30 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186952#comment-16186952
 ] 

Noble Paul commented on SOLR-11423:
---

It makes sense to do a hard cap. make it a configurable cluster property with a 
sensible default. I guess 10K is a pretty small limit 

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-09-29 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186282#comment-16186282
 ] 

Scott Blum commented on SOLR-11423:
---

CC: [~jhump] [~noble.paul] [~shalinmangar]

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org