[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-05-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-7121:
--
Assignee: (was: Mark Miller)

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-05-01 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-7121:
--
Attachment: SOLR-7121.patch

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-04-30 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-7121:
--
Attachment: SOLR-7121.patch

Here is a patch file for the pull request with a bit of cleanup and updated to 
trunk.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-03-02 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
---
Attachment: SOLR-7121.patch

Added tests for long-running queries and 95th/5MinRateRequest statistics as 
well.
GC-time test will need to run for quite sometime before it can detect the same, 
hence not adding test for that.
But the remaining tests should provide a good testing of the patch.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-02-27 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
---
Attachment: SOLR-7121.patch

The latest patch includes a test-case for a core going down when its configured 
number of threads is exceeded.
The core is automatically brought up by the Health-Poller when the number of 
threads comes below that threshold.

I will try to include a test for long-running-queries as well in the next few 
days but that should be independent of this patch's code-review.

[~otis], great suggestion. I will surely add these metrics to JMX but can we 
handle that in a follow-up ticket to this one? Let me know.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, 
 SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-02-20 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
---
Attachment: SOLR-7121.patch

[~elyograg], [~markrmil...@gmail.com],
The latest patch has only a single thread per core.
Will add testcase soon.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-02-19 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
---
Attachment: SOLR-7121.patch

[~elyograg], here is another patch which removes System.currentTimeMillis().

Most of the important values are already in the configuration and turned off by 
default.
{code:xml}
  coreDownThresholds name=thresholds1

bool name=goDownIfHighLoadfalse/bool

str name=coreNameExpressionabc.*/str

int name=coreLimitMaxThreads45/int

int name=coreLimitMaxGcMillis1/int

!-- These 3 options must be specified together and are used as an AND 
condition --
int name=coreLimitMaxLongQueries100/int
int name=coreLimitLongQueryTime100/int
int name=coreLimitMaxLongQueriesInterval1000/int

!-- These 2 options must be specified together and are used as an AND 
condition --
int name=coreLimitMax95thPcSelectTime-1/int
int name=coreLimitMax5MinSelectRate-1/int
  /coreDownThresholds
{code}
Very few options are hard-coded values as I felt it would be best to leave 
those out of configuration. Will wait for this patch's complete review comments 
before converting them to configuration as well.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch, SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion

2015-02-17 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
---
Attachment: SOLR-7121.patch

Here is a patch which empowers the cores to monitor their own health, go down 
proactively if thresholds are breached and come up automatically when health 
improves.

 Solr nodes should go down based on configurable thresholds and not rely on 
 resource exhaustion
 --

 Key: SOLR-7121
 URL: https://issues.apache.org/jira/browse/SOLR-7121
 Project: Solr
  Issue Type: New Feature
Reporter: Sachin Goyal
 Attachments: SOLR-7121.patch


 Currently, there is no way to control when a Solr node goes down.
 If the server is having high GC pauses or too many threads or is just getting 
 too many queries due to some bad load-balancer, the cores in the machine keep 
 on serving unless they exhaust the machine's resources and everything comes 
 to a stall.
 Such a slow-dying core can affect other cores as well by taking huge time to 
 serve their distributed queries.
 There should be a way to specify some threshold values beyond which the 
 targeted core can its ill-health and proactively go down to recover.
 When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org