[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-7121: -- Assignee: (was: Mark Miller) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-7121: -- Attachment: SOLR-7121.patch Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-7121: -- Attachment: SOLR-7121.patch Here is a patch file for the pull request with a bit of cleanup and updated to trunk. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goyal updated SOLR-7121: --- Attachment: SOLR-7121.patch Added tests for long-running queries and 95th/5MinRateRequest statistics as well. GC-time test will need to run for quite sometime before it can detect the same, hence not adding test for that. But the remaining tests should provide a good testing of the patch. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goyal updated SOLR-7121: --- Attachment: SOLR-7121.patch The latest patch includes a test-case for a core going down when its configured number of threads is exceeded. The core is automatically brought up by the Health-Poller when the number of threads comes below that threshold. I will try to include a test for long-running-queries as well in the next few days but that should be independent of this patch's code-review. [~otis], great suggestion. I will surely add these metrics to JMX but can we handle that in a follow-up ticket to this one? Let me know. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goyal updated SOLR-7121: --- Attachment: SOLR-7121.patch [~elyograg], [~markrmil...@gmail.com], The latest patch has only a single thread per core. Will add testcase soon. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goyal updated SOLR-7121: --- Attachment: SOLR-7121.patch [~elyograg], here is another patch which removes System.currentTimeMillis(). Most of the important values are already in the configuration and turned off by default. {code:xml} coreDownThresholds name=thresholds1 bool name=goDownIfHighLoadfalse/bool str name=coreNameExpressionabc.*/str int name=coreLimitMaxThreads45/int int name=coreLimitMaxGcMillis1/int !-- These 3 options must be specified together and are used as an AND condition -- int name=coreLimitMaxLongQueries100/int int name=coreLimitLongQueryTime100/int int name=coreLimitMaxLongQueriesInterval1000/int !-- These 2 options must be specified together and are used as an AND condition -- int name=coreLimitMax95thPcSelectTime-1/int int name=coreLimitMax5MinSelectRate-1/int /coreDownThresholds {code} Very few options are hard-coded values as I felt it would be best to leave those out of configuration. Will wait for this patch's complete review comments before converting them to configuration as well. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch, SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7121) Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion
[ https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goyal updated SOLR-7121: --- Attachment: SOLR-7121.patch Here is a patch which empowers the cores to monitor their own health, go down proactively if thresholds are breached and come up automatically when health improves. Solr nodes should go down based on configurable thresholds and not rely on resource exhaustion -- Key: SOLR-7121 URL: https://issues.apache.org/jira/browse/SOLR-7121 Project: Solr Issue Type: New Feature Reporter: Sachin Goyal Attachments: SOLR-7121.patch Currently, there is no way to control when a Solr node goes down. If the server is having high GC pauses or too many threads or is just getting too many queries due to some bad load-balancer, the cores in the machine keep on serving unless they exhaust the machine's resources and everything comes to a stall. Such a slow-dying core can affect other cores as well by taking huge time to serve their distributed queries. There should be a way to specify some threshold values beyond which the targeted core can its ill-health and proactively go down to recover. When the load improves, the core should come up automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org