[
https://issues.apache.org/jira/browse/SOLR-17926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022573#comment-18022573
]
Chris M. Hostetter commented on SOLR-17926:
-------------------------------------------
I definitely prefer the approach in the PR over the use of "NOW" in the patch.
"NOW" really makes sense for ensuring that date rounding/arithmetic of values
_in the documents_ are treated consistently regardless of replica or query
stage _because we *expect* clock drift_ between the replicas - I don't think it
makes sense to try and use it to do "how much timeAllowed do i have left?" type
calculations (on the order of 10s of milliseconds) in replicas that didn't
generate the "NOW" value in the first place.
(We also document "NOW" in the ref-guide as a way for clients to request to
specify the frame of refrence they have to requests that include date math – so
anyone doing that would get all sorts of really wonky timeAllowed results if we
go that route)
----
Two things about the PR that i'm confused by:
# I'm not really sure though that I understand the utility of
{{adjustShardRequestLimit}} adding {{USED_PARAM}} on sub-requests, instead of
just decrementing the value of {{timeAllowed}} on the sub-requests (like the
current grouping code does) ?
# I don't really understand the point of INFLIGHT_PARAM ? Nothing in the
code sets it, which I guess is fine? – it looks like it's intended to just be a
way for external clients to override the implicit assumption that "2ms" isn't
enough (remaining) time to bother sending sub-requests – but the only code path
where {{req.getParams().getLong(INFLIGHT_PARAM, DEFAULT_INFLIGHT_MS)}} is
called is a conditional block in the constructor where we already know "{{{}//
this is a sub-request{}}}" .. which means {{adjustShardRequestLimit}} (which is
only ever going to get called in the original parent request) will only ever
use the DEFAULT_INFLIGHT_MS of 2ms ... right?
> Discount timeAllowed for all types of queries
> ---------------------------------------------
>
> Key: SOLR-17926
> URL: https://issues.apache.org/jira/browse/SOLR-17926
> Project: Solr
> Issue Type: Improvement
> Affects Versions: 9.9
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Labels: pull-request-available
> Attachments: SOLR-17926-using-NOW.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Spin-off from SOLR-17869.
> Currently only {{TopGroupsShardRequestFactory}} subtracts the time already
> spent on local request processing from {{timeAllowed}} before sending shard
> requests.
> This is inconsistent and likely not optimal. Since {{timeAllowed}} tracks
> wall-clock time it makes sense to track the same starting point for all
> phases of distributed request processing and terminate processing early when
> the allowed time runs out, as compared to the original starting point.
> This is not the way it works now, though (except for this special case of
> grouping queries): the same time span is allocated to the query coordinator
> and to the shard requests where the processing starts later, which means that
> the coordinator may time out while waiting for responses even if all shard
> requests succeeded.
> [~dsmiley] suggested to use {{SolrRequestInfo.getNOW()}} instead, as the
> absolute starting point for both local and distributed requests, and compare
> {{timeAllowed}} to that starting point. However, this relies on correct time
> sync between nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]