[ 
https://issues.apache.org/jira/browse/SOLR-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442131#comment-17442131
 ] 

Noble Paul edited comment on SOLR-14298 at 9/7/23 6:29 AM:
-----------------------------------------------------------

[~hossman] I completely agree with you. Doing a match-all query with billions 
of documents per shard can be a very costly operation even though the row is 
set to 0 with distrib false. 

Here are some of the calls and their respective QTime from one of my setup : 

 
{code:java}
2021-11-09 21:15:53.853 WARN  (qtp435914790-25301965) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226420962 status=0 QTime=15516
2021-11-10 00:45:16.816 WARN  (qtp435914790-25341761) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226984671 status=0 QTime=15169
2021-11-10 00:45:30.772 WARN  (qtp435914790-25339675) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226985078 status=0 QTime=15494
2021-11-10 00:45:34.244 WARN  (qtp435914790-25334052) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226985527 status=0 QTime=15462
2021-11-10 00:46:19.480 WARN  (qtp435914790-25340369) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226987732 status=0 QTime=14553
2021-11-10 18:03:49.885 WARN  (qtp435914790-25486769) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228021741 status=0 QTime=16130
2021-11-10 18:04:14.511 WARN  (qtp435914790-25523411) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228021262 status=0 QTime=16626
2021-11-10 18:04:23.904 WARN  (qtp435914790-25454090) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020775 status=0 QTime=16556
2021-11-10 18:04:43.355 WARN  (qtp435914790-25505322) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020627 status=0 QTime=17029
2021-11-10 18:04:49.181 WARN  (qtp435914790-25509646) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020577 status=0 QTime=17242
2021-11-10 18:04:53.577 WARN  (qtp435914790-25484919) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020577 status=0 QTime=19169
2021-11-10 18:05:06.366 WARN  (qtp435914790-25523409) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228019443 status=0 QTime=17352
2021-11-10 18:05:07.594 WARN  (qtp435914790-25527485) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228018445 status=0 QTime=17309
2021-11-10 18:05:07.908 WARN  (qtp435914790-25496685) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228018445 status=0 QTime=17445 {code}
 

 

The QTime is over 14 , 15 seconds for all such checks which can be avoided as 
you suggested. 

Here is my observation for the 3 approaches suggested by you: 

1. The *segmentTerminateEarly* option might not help us as the default merge 
policy is *TieredMergePolicyFactory.* 

As per 
[https://solr.apache.org/guide/8_6/common-query-parameters.html#segmentterminateearly-parameter]

If *segmentTerminateEarly*   is set to true, and if [the 
mergePolicyFactory|https://solr.apache.org/guide/8_6/indexconfig-in-solrconfig.html#mergepolicyfactory]
 for this collection is a 
[SortingMergePolicyFactory|https://lucene.apache.org/solr/8_6_0/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html]
 which uses a sort option compatible with [the sort 
parameter|https://solr.apache.org/guide/8_6/common-query-parameters.html#sort-parameter]
 specified for this query, then Solr will be able to skip documents on a 
per-segment basis that are definitively not candidates for the current page of 
results.

2. Use of timeAllowed: Using the smaller value of timeAllowed helps reduce the 
QTime Drastically and it would return partial results.

!image-2021-11-11-13-11-30-930.png|width=524,height=288!

3. Negation match all query ie. q=-*{*}:*{*} 

This seems to be the fastest option and it would literally be a 1 character 
patch. 

!image-2021-11-11-13-13-20-791.png|width=524,height=267!

Kindly let me know your thoughts and then we can plan for a patch accordingly!


was (Author: dineshnaik):
[~hossman] I completely agree with you. Doing a match-all query with billions 
of documents per shard can be a very costly operation even though the row is 
set to 0 with distrib false. 

Here are some of the calls and their respective QTime from one of my setup : 

 
{code:java}
2021-11-09 21:15:53.853 WARN  (qtp435914790-25301965) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226420962 status=0 QTime=15516
2021-11-10 00:45:16.816 WARN  (qtp435914790-25341761) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226984671 status=0 QTime=15169
2021-11-10 00:45:30.772 WARN  (qtp435914790-25339675) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226985078 status=0 QTime=15494
2021-11-10 00:45:34.244 WARN  (qtp435914790-25334052) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226985527 status=0 QTime=15462
2021-11-10 00:46:19.480 WARN  (qtp435914790-25340369) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1226987732 status=0 QTime=14553
2021-11-10 18:03:49.885 WARN  (qtp435914790-25486769) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228021741 status=0 QTime=16130
2021-11-10 18:04:14.511 WARN  (qtp435914790-25523411) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228021262 status=0 QTime=16626
2021-11-10 18:04:23.904 WARN  (qtp435914790-25454090) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020775 status=0 QTime=16556
2021-11-10 18:04:43.355 WARN  (qtp435914790-25505322) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020627 status=0 QTime=17029
2021-11-10 18:04:49.181 WARN  (qtp435914790-25509646) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020577 status=0 QTime=17242
2021-11-10 18:04:53.577 WARN  (qtp435914790-25484919) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228020577 status=0 QTime=19169
2021-11-10 18:05:06.366 WARN  (qtp435914790-25523409) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228019443 status=0 QTime=17352
2021-11-10 18:05:07.594 WARN  (qtp435914790-25527485) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228018445 status=0 QTime=17309
2021-11-10 18:05:07.908 WARN  (qtp435914790-25496685) 
x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: 
[Item_collection_shard15_replica_n115]  webapp=/solr path=/select 
params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} 
hits=1228018445 status=0 QTime=17445 {code}
 

 

The QTime is over 14 , 15 seconds for all such checks which can be avoided as 
you suggested. 

Here is my observation for the 3 approaches suggested by you: 

1. The *segmentTerminateEarly* option might not help us as the default merge 
policy is *TieredMergePolicyFactory.* 

As per 
[https://solr.apache.org/guide/8_6/common-query-parameters.html#segmentterminateearly-parameter]

If *segmentTerminateEarly*   is set to true, and if [the 
mergePolicyFactory|https://solr.apache.org/guide/8_6/indexconfig-in-solrconfig.html#mergepolicyfactory]
 for this collection is a 
[SortingMergePolicyFactory|https://lucene.apache.org/solr/8_6_0/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html]
 which uses a sort option compatible with [the sort 
parameter|https://solr.apache.org/guide/8_6/common-query-parameters.html#sort-parameter]
 specified for this query, then Solr will be able to skip documents on a 
per-segment basis that are definitively not candidates for the current page of 
results.

2. Use of timeAllowed: Using the smaller value of timeAllowed helps reduce the 
QTime Drastically and it would return partial results.

!image-2021-11-11-13-11-30-930.png|width=524,height=288!

3. Negation match all query ie. q=-*:* 

This seems to be the fastest option and it would literally be a 1 character 
patch. 

!image-2021-11-11-13-13-20-791.png|width=524,height=267!

Kindly let me know your thoughts and then we can plan for a patch accordingly!

> LBSolrClient.checkAZombieServer should be less stupid
> -----------------------------------------------------
>
>                 Key: SOLR-14298
>                 URL: https://issues.apache.org/jira/browse/SOLR-14298
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: image-2021-11-11-13-11-30-930.png, 
> image-2021-11-11-13-13-20-791.png
>
>
> LBSolrClient.checkAZombieServer() currently does /select query for {{\*:\*}} 
> with distrib=false, rows=0, sort=\_docid\_ ... but this can still chew up a 
> lot of time if the shard is big, and it's not self evident wtf is going on in 
> the server logs.
> At a minimum, these requests should include some sort of tracing param to 
> identify the point of he query (ie: {{_zombieservercheck=true}}) and should 
> probably be changed to hit something like the /ping handler, or the node 
> status handler, or if it's important to folks that it do a "search" that 
> actaully uses the index searcher, then it should use  options like 
> timeAllowed / segmentTerminateEarly, and/or {{q=-\*:\*}} instead .. or maybe 
> a cusorMark ... something to make it not have the overhead of counting all 
> the hits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to