[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

Hoss Man (JIRA) Fri, 03 Oct 2014 15:23:24 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158609#comment-14158609
 ]


Hoss Man commented on SOLR-5986:
--------------------------------

FYI: this assertion (modified by r1627622/r1627635) has been failing in jenkins 
several times since committed...

{noformat}
1498993     shalin       // test group query
1627635     anshum       // TODO: Remove this? This doesn't make any real sense 
now that timeAllowed might trigger early
1627635     anshum       //       termination of the request during Terms 
enumeration/Query expansion.
1627635     anshum       //       During such an exit, partial results isn't 
supported as it wouldn't make any sense.
1627635     anshum       // Increasing the timeAllowed from 1 to 100 for now.
1498993     shalin       queryPartialResults(upShards, upClients,
1498993     shalin           "q", "*:*",
1498993     shalin           "rows", 100,
1498993     shalin           "fl", "id," + i1,
1498993     shalin           "group", "true",
1498993     shalin           "group.query", t1 + ":kings OR " + t1 + ":eggs",
1498993     shalin           "group.limit", 10,
1498993     shalin           "sort", i1 + " asc, id asc",
1627635     anshum           CommonParams.TIME_ALLOWED, 100,
1498993     shalin           ShardParams.SHARDS_INFO, "true",
1498993     shalin           ShardParams.SHARDS_TOLERANT, "true");
{noformat}

example: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11221/
{noformat}
Error Message:
Request took too long during query expansion. Terminating request.

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Request 
took too long during query expansion. Terminating request.
        at 
__randomizedtesting.SeedInfo.seed([377AFD4F005F159A:B69C7357770075A6]:0)
        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:570)
        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:215)
        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
        at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
        at 
org.apache.solr.TestDistributedSearch.queryPartialResults(TestDistributedSearch.java:596)
        at 
org.apache.solr.TestDistributedSearch.doTest(TestDistributedSearch.java:499)
        at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:875)
{noformat}

I'm not fully understanding what anshum ment by this TODO, and I think he's 
offline for the next few days, so i went ahead and comment this out with a link 
back to this jira for him to look at before resolving this jira.

> Don't allow runaway queries from harming Solr cluster health or search 
> performance
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-5986
>                 URL: https://issues.apache.org/jira/browse/SOLR-5986
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Steve Davids
>            Assignee: Anshum Gupta
>            Priority: Critical
>             Fix For: 5.0
>
>         Attachments: SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, 
> SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, 
> SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, SOLR-5986.patch, 
> SOLR-5986.patch
>
>
> The intent of this ticket is to have all distributed search requests stop 
> wasting CPU cycles on requests that have already timed out or are so 
> complicated that they won't be able to execute. We have come across a case 
> where a nasty wildcard query within a proximity clause was causing the 
> cluster to enumerate terms for hours even though the query timeout was set to 
> minutes. This caused a noticeable slowdown within the system which made us 
> restart the replicas that happened to service that one request, the worst 
> case scenario are users with a relatively low zk timeout value will have 
> nodes start dropping from the cluster due to long GC pauses.
> [~amccurry] Built a mechanism into Apache Blur to help with the issue in 
> BLUR-142 (see commit comment for code, though look at the latest code on the 
> trunk for newer bug fixes).
> Solr should be able to either prevent these problematic queries from running 
> by some heuristic (possibly estimated size of heap usage) or be able to 
> execute a thread interrupt on all query threads once the time threshold is 
> met. This issue mirrors what others have discussed on the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5986) Don't allow runaway queries from harming Solr cluster health or search performance

Reply via email to