[
https://issues.apache.org/jira/browse/SOLR-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris M. Hostetter updated SOLR-17182:
--------------------------------------
Attachment: gen-random-csv.pl
run-queries.sh
Status: Open (was: Open)
The original thinking in SOLR-16693 was basically:
* Lucene made some changes:
** ExitableDirectoryReader now has overhead even if the QueryTimeout it uses
is a No-Op
** IndexSearcher.searchWithTimeout feature that can use QueryTimeout
* Therefore Solr should make the following changes:
** Use IndexSearcher.searchWithTimeout from SolrIndexSearcher anytime there is
a QueryTimeout in the request
** Stop using ExitableDirectoryReader completely
*** As a hedge against breaking backcompat: add a system property (that we
don't plan to support for long) to trigger EDR use
*** if this sysprop is enabled, IndexSearcher.searchWithTimeout is probably
redundant so don't use it
My comment at the time, which was forked off into the creation of this jira
(SOLR-16693), and the specific Implementation suggestion I made were:
* Focused on concerns about the "non-search" code paths in Solr that use
IndexReader (faceting, spellcheck, etc...) that would no longer respect
timeAllowed (now more broadly "QueryLimits")
* Assumed that if we added a way for these code-paths to get and EDR if and
only if the request used QueryLimits, we could still eliminate the system
property and it's conditional logic
** Which implicitly builds off the assumption that
{{IndexSearcher.searchWithTimeout}} is adequate solution for enforcing
{{QueryLimits}} in the "search" logic.
But as a result of some ad hoc testing I did after discovering SOLR-17831, I'm
no longer convinced that {{IndexSearcher.searchWithTimeout}} is (always) an
adequate solution for enforcing {{QueryLimits}} in the "search" logic.
With the fix for SOLR-17831 in place, using
{{solr.useExitableDirectoryReader=true}} results in "slow" or "expensive"
queries – w/o any complex features like faceting, spellcheck, etc... – being
terminated *much* more quickly/aggressively then
{{solr.useExitableDirectoryReader=false}} (and the implicit default
IndexSearcher.searchWithTimeout / TimeLimitingBulkScorer behavior).
(I haven't exhaustively tested this, but IIUC: when & where – and how rarely --
{{TimeLimitingBulkScorer}} checks the {{QueryTimeout}} means that it takes
longer and longer to notice the limit has been exceeded as the number of
documents matched and/or the limit itself increase?)
----
I'm attaching some quick and dirty scripts I used to compare EDR to the default
behavior using an index of 1 million sythetic documents and 99 requests of
complex disjunctions of prefix queries (which was the simplest way i could
think of to force a complicated "search" w/o using features that would give EDR
and advantage – like function queries) using {{timeAllowed=50}}
My notes are below, but the bottom line is:
* using EDR caused the mean QTime to be ~ "50ms" – regardless of the number of
segments
* the default behavior caused a mean QTime of ~ "85-150ms" depending on the
number of segments
I think for people who *really* care about having {{QueryLimits}} enforced, we
need a (long term supported) option for using {{ExitableDirectoryReader}}
_everywhere_ in Solr – including {{{}SolrIndexSearcher{}}}.
{noformat}
# Use techproducts example
#
./solr/packaging/build/dev/bin/solr -e techproducts --no-prompt
# build up 1 million random docs
#
./gen-random-csv.pl 1000000 200000 100w200000 100w2000000 100w2000000
100w2000000 100w2000000 > data.csv
# index them, using commitWithin (more aggressively then we probably need to)
to force multiple segments
# (in my case i wound up with 48 segments)
#
curl
'http://localhost:8983/solr/techproducts/update?commitWithin=1000&fieldnames=id,a_t,b_t,c_t,d_t,e_t,f_t'
--data-binary @data.csv -H 'Content-type:application/csv'
{noformat}
{noformat}
# now run a bunch of synthetic (non-cached) queries with a large number of
disjunctions
# to see how timeAllowed behaves and how long it takes to time out by default
...
#
bash run-queries.sh > 48seg.timeallowed.default.txt
grep QTime 48seg.timeallowed.default.txt | perl -nle 'chomp; /.*?(\d+)/ or die;
$sum+=$1; END { print $sum }'
15542
{noformat}
{noformat}
# Re-start solr with EDR enabled and run the same test...
#
./solr/packaging/build/dev/bin/solr -e techproducts --no-prompt --jvm-opts
-Dsolr.useExitableDirectoryReader=true
bash run-queries.sh > 48seg.timeallowed.edr.txt
grep QTime 48seg.timeallowed.edr.txt | perl -nle 'chomp; /.*?(\d+)/ or die;
$sum+=$1; END { print $sum }'
4961
{noformat}
{noformat}
# Restart solr with solr.autoCommit.maxTime==-1 and reindex everything to try
and minimize segment counts
#
# In my case, I got it down to 18 segmentds
#
./solr/packaging/build/dev/bin/solr -e techproducts --no-prompt --jvm-opts
-Dsolr.autoCommit.maxTime=-1
curl 'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary
'<delete><query>*:*</query></delete>'
curl
'http://localhost:8983/solr/techproducts/update?commit=true&fieldnames=id,a_t,b_t,c_t,d_t,e_t,f_t'
--data-binary @data.csv -H 'Content-type:application/csv'
curl 'http://localhost:8983/solr/techproducts/update?commit=true&optimize=true'
{noformat}
{noformat}
# re-run same timeallowed queries (default behavior, 18 segments)
#
bash run-queries.sh > 18seg.timeallowed.default.txt
grep QTime 18seg.timeallowed.default.txt | perl -nle 'chomp; /.*?(\d+)/ or die;
$sum+=$1; END { print $sum }'
8505
{noformat}
{noformat}
# Re-start solr with EDR enabled, and run the same test again on this 18
segment index...
#
bash run-queries.sh > 18seg.timeallowed.edr.txt
grep QTime 18seg.timeallowed.edr.txt | perl -nle 'chomp; /.*?(\d+)/ or die;
$sum+=$1; END { print $sum }'
4872
{noformat}
> Eliminate the need for 'solr.useExitableDirectoryReader' sysprop
> ----------------------------------------------------------------
>
> Key: SOLR-17182
> URL: https://issues.apache.org/jira/browse/SOLR-17182
> Project: Solr
> Issue Type: Sub-task
> Components: Query Limits
> Reporter: Chris M. Hostetter
> Assignee: Gus Heck
> Priority: Major
> Attachments: gen-random-csv.pl, run-queries.sh
>
>
> As the {{QueryLimit}} functionality in Solr gets beefed up, and supports
> multiple types of limits, it would be nice if we could find a way to
> eliminate the need for the {{solr.useExitableDirectoryReader}} sysprop, and
> instead just have codepaths that use the underlying IndexReader (like
> faceting, spellcheck, etc...) automatically get a reader that enforces the
> limits if/when limits are in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]