[ https://issues.apache.org/jira/browse/SOLR-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442131#comment-17442131 ]
Dinesh Kumar Naik commented on SOLR-14298: ------------------------------------------ [~hossman] I completely agree with you. Doing a match-all query with billions of documents per shard can be a very costly operation even though the row is set to 0 with distrib false. Here are some of the calls and their respective QTime from one of my setup : {code:java} 2021-11-09 21:15:53.853 WARN (qtp435914790-25301965) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226420962 status=0 QTime=15516 2021-11-10 00:45:16.816 WARN (qtp435914790-25341761) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226984671 status=0 QTime=15169 2021-11-10 00:45:30.772 WARN (qtp435914790-25339675) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226985078 status=0 QTime=15494 2021-11-10 00:45:34.244 WARN (qtp435914790-25334052) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226985527 status=0 QTime=15462 2021-11-10 00:46:19.480 WARN (qtp435914790-25340369) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226987732 status=0 QTime=14553 2021-11-10 18:03:49.885 WARN (qtp435914790-25486769) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228021741 status=0 QTime=16130 2021-11-10 18:04:14.511 WARN (qtp435914790-25523411) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228021262 status=0 QTime=16626 2021-11-10 18:04:23.904 WARN (qtp435914790-25454090) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020775 status=0 QTime=16556 2021-11-10 18:04:43.355 WARN (qtp435914790-25505322) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020627 status=0 QTime=17029 2021-11-10 18:04:49.181 WARN (qtp435914790-25509646) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020577 status=0 QTime=17242 2021-11-10 18:04:53.577 WARN (qtp435914790-25484919) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020577 status=0 QTime=19169 2021-11-10 18:05:06.366 WARN (qtp435914790-25523409) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228019443 status=0 QTime=17352 2021-11-10 18:05:07.594 WARN (qtp435914790-25527485) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228018445 status=0 QTime=17309 2021-11-10 18:05:07.908 WARN (qtp435914790-25496685) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115] webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228018445 status=0 QTime=17445 {code} The QTime is over 14 , 15 seconds for all such checks which can be avoided as you suggested. Here is my observation for the 3 approaches suggested by you: 1. The *segmentTerminateEarly* option might not help us as the default merge policy is *TieredMergePolicyFactory.* As per [https://solr.apache.org/guide/8_6/common-query-parameters.html#segmentterminateearly-parameter] If *segmentTerminateEarly* is set to true, and if [the mergePolicyFactory|https://solr.apache.org/guide/8_6/indexconfig-in-solrconfig.html#mergepolicyfactory] for this collection is a [SortingMergePolicyFactory|https://lucene.apache.org/solr/8_6_0/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html] which uses a sort option compatible with [the sort parameter|https://solr.apache.org/guide/8_6/common-query-parameters.html#sort-parameter] specified for this query, then Solr will be able to skip documents on a per-segment basis that are definitively not candidates for the current page of results. 2. Use of timeAllowed: Using the smaller value of timeAllowed helps reduce the QTime Drastically and it would return partial results. !image-2021-11-11-13-11-30-930.png|width=524,height=288! 3. Negation match all query ie. q=-*:* This seems to be the fastest option and it would literally be a 1 character patch. !image-2021-11-11-13-13-20-791.png|width=524,height=267! Kindly let me know your thoughts and then we can plan for a patch accordingly! > LBSolrClient.checkAZombieServer should be less stupid > ----------------------------------------------------- > > Key: SOLR-14298 > URL: https://issues.apache.org/jira/browse/SOLR-14298 > Project: Solr > Issue Type: Bug > Reporter: Chris M. Hostetter > Priority: Major > Attachments: image-2021-11-11-13-11-30-930.png, > image-2021-11-11-13-13-20-791.png > > > LBSolrClient.checkAZombieServer() currently does /select query for {{\*:\*}} > with distrib=false, rows=0, sort=\_docid\_ ... but this can still chew up a > lot of time if the shard is big, and it's not self evident wtf is going on in > the server logs. > At a minimum, these requests should include some sort of tracing param to > identify the point of he query (ie: {{_zombieservercheck=true}}) and should > probably be changed to hit something like the /ping handler, or the node > status handler, or if it's important to folks that it do a "search" that > actaully uses the index searcher, then it should use options like > timeAllowed / segmentTerminateEarly, and/or {{q=-\*:\*}} instead .. or maybe > a cusorMark ... something to make it not have the overhead of counting all > the hits. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org