[jira] [Updated] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

Anshum Gupta (JIRA) Wed, 10 Sep 2014 00:10:52 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anshum Gupta updated SOLR-5986:
-------------------------------
    Attachment: SOLR-5986.patch

New patch with a little different approach.
* Renamed the ExitObject to QueryTimeout as it happens to actually be just a 
QueryTimeout.
* Instead of setting/resetting it inside the SolrIndexSearcher methods, I'm 
instead setting and resetting the QueryTimeout at the Handler level. I've made 
MLTHandler and the SearchHandler to work with this as I can't think of any 
other handler that would be affected by this.
* As we'd now be setting the timeOut at a more global level, we can work 
towards (in another JIRA) using this value for different components and 
different stages e.g. TimeLimitingCollector etc.

> Don't allow runaway queries from harming Solr cluster health or search 
> performance
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-5986
>                 URL: https://issues.apache.org/jira/browse/SOLR-5986
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Steve Davids
>            Assignee: Anshum Gupta
>            Priority: Critical
>             Fix For: 4.10
>
>         Attachments: SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, 
> SOLR-5986.patch
>
>
> The intent of this ticket is to have all distributed search requests stop 
> wasting CPU cycles on requests that have already timed out or are so 
> complicated that they won't be able to execute. We have come across a case 
> where a nasty wildcard query within a proximity clause was causing the 
> cluster to enumerate terms for hours even though the query timeout was set to 
> minutes. This caused a noticeable slowdown within the system which made us 
> restart the replicas that happened to service that one request, the worst 
> case scenario are users with a relatively low zk timeout value will have 
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
> BLUR-142 (see commit comment for code, though look at the latest code on the 
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running 
> by some heuristic (possibly estimated size of heap usage) or be able to 
> execute a thread interrupt on all query threads once the time threshold is 
> met. This issue mirrors what others have discussed on the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%[email protected]%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

Reply via email to