[ https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159197#comment-14159197 ]
Anshum Gupta commented on SOLR-5986: ------------------------------------ Here's what I meant with that statement: This test should either be removed or modified to ensure that the timeAllowed is never hit during query expansion. Here's more of the context: Until 5986 was committed, the timeAllowed parameter was only used during the collection stage. That stage also supported returning of partial matches if some shards returned responses and didn't time out. After this commit, the timeAllowed parameter could lead to early termination of a request way before the search actually happens i.e. during query expansion. At this stage, partial results aren't returned. The current test tries to send a request assuming that the timeOut would happen *only* during the collection stage, leading to partial results being returned. BUT if, the request times out during query expansion, no partial results are returned, leading to a test failure. I'll remove the partial results test. I'll also think about adding something to replace this (I certainly don't want coverage to go down but this test isn't really a valid case anymore). May be add something that uses caching to avoid query expansion but times out during doc collection. > Don't allow runaway queries from harming Solr cluster health or search > performance > ---------------------------------------------------------------------------------- > > Key: SOLR-5986 > URL: https://issues.apache.org/jira/browse/SOLR-5986 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Steve Davids > Assignee: Anshum Gupta > Priority: Critical > Fix For: 5.0 > > Attachments: SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, > SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, > SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, > SOLR-5986.patch > > > The intent of this ticket is to have all distributed search requests stop > wasting CPU cycles on requests that have already timed out or are so > complicated that they won't be able to execute. We have come across a case > where a nasty wildcard query within a proximity clause was causing the > cluster to enumerate terms for hours even though the query timeout was set to > minutes. This caused a noticeable slowdown within the system which made us > restart the replicas that happened to service that one request, the worst > case scenario are users with a relatively low zk timeout value will have > nodes start dropping from the cluster due to long GC pauses. > [~amccurry] Built a mechanism into Apache Blur to help with the issue in > BLUR-142 (see commit comment for code, though look at the latest code on the > trunk for newer bug fixes). > Solr should be able to either prevent these problematic queries from running > by some heuristic (possibly estimated size of heap usage) or be able to > execute a thread interrupt on all query threads once the time threshold is > met. This issue mirrors what others have discussed on the mailing list: > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org