[
https://issues.apache.org/jira/browse/SOLR-17926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023193#comment-18023193
]
Andrzej Bialecki commented on SOLR-17926:
-----------------------------------------
In the latest version I saw two unexpected NPEs, which may have been obvious in
hindsight: {{ResponseBuilder.getResponseDocs()}} and
{{ResponseBuilder.resultIds}} were not just unpopulated but simply left as
null, either because of skipped sub-requests or no sub-requests returning any
results due to EDR exceptions.
I patched these two places so that they don't throw NPEs. However, this shows
that the code that has been around for years can still malfunction with these
new improvements to query limits, both skipping and throwing exceptions from
many layers down.
I have in mind a test where we first collect all traces where EDR methods are
called and then in the next round randomly throw exceptions from some of them
to see if all edge-cases are handled well. This should be easy to implement
using something likeĀ {{{}CallerSpecificQueryLimit{}}}.
> Discount timeAllowed for all types of queries
> ---------------------------------------------
>
> Key: SOLR-17926
> URL: https://issues.apache.org/jira/browse/SOLR-17926
> Project: Solr
> Issue Type: Improvement
> Affects Versions: 9.9
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Labels: pull-request-available
> Attachments: SOLR-17926-using-NOW.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Spin-off from SOLR-17869.
> Currently only {{TopGroupsShardRequestFactory}} subtracts the time already
> spent on local request processing from {{timeAllowed}} before sending shard
> requests.
> This is inconsistent and likely not optimal. Since {{timeAllowed}} tracks
> wall-clock time it makes sense to track the same starting point for all
> phases of distributed request processing and terminate processing early when
> the allowed time runs out, as compared to the original starting point.
> This is not the way it works now, though (except for this special case of
> grouping queries): the same time span is allocated to the query coordinator
> and to the shard requests where the processing starts later, which means that
> the coordinator may time out while waiting for responses even if all shard
> requests succeeded.
> [~dsmiley] suggested to use {{SolrRequestInfo.getNOW()}} instead, as the
> absolute starting point for both local and distributed requests, and compare
> {{timeAllowed}} to that starting point. However, this relies on correct time
> sync between nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]