[
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-6349:
---------------------------
Attachment: make-data-and-queries.pl
SOLR-6349.patch
Well, after wrapping up SOLR-7171, i came back to this patch and started making
the following improvements...
{panel:title=changes in this patch}
* TestDistributedSearch
** beefed up inspection of shard responses now that SOLR-7171 is fixed
* StatsValuesFactory
** introduce new final booleans to track the stats we want to encourage JIT
optimization
** fix some inconsistencies in when/if various stats are returned to clients
depending on values seen (notably with Dates before the epoch)
* StatsComponentTest
** testFieldStatisticsResultsDateFieldAlwaysMissing
*** the set of stats that we've returned for Dates in the past has been
inconsistent depending on if there were "non-missing" docs ... updated this
test to expect everything
{panel}
...after making these changes, i went to re-run that mini-benchmark i wrote
before, and in looking at it realized i made a serious conceptual mistake...
{quote}
* user currently asks for stats on fields, only cares about 4of8 of them
...
...the sequence of stat field requests are identicle between the 2 bash files,
but in one URLs include localparams to only compute min/max/mean/stddev for the
field.
{quote}
...the problem being that because these are distributed requests, the new style
requests still require that sum, count & sumOfSquares be computed on every
shard (in addition to the min & max) ... the *final* responses to the query
client are smaller, but we're still computing virtually the same amount of math
calculations and each shard is returning virtually the same amount of data.
(the only calculation "skipped" is the "missing" stat which is actually
irrelevant in this test because every doc has a value in every field)
So i did a quick tweak to make-data-and-queries.pl so that the only stats
requested are "min" & "mean" -- so the shards only have to compute min, sum,
count. With that change, and the new patch, the numbers look much better...
{noformat}
pre-patch with patch with patch
Run # old style old style new style
(all stats) (all stats) (2 stats, 3 deps)
1 135.7 sec 134.4 sec 109.9 sec
2 130.1 sec 132.4 sec 109.0 sec
3 132.3 sec 132.8 sec 108.2 sec
total 398.1 sec 399.6 sec 327.1 sec
00.3 % slower 17.8 % faster
{noformat}
...so these numbers are a lot more promising.
this also makes me want to run some more perf tests, on more permutations of
stats - in particular i want to check the non-cloud mode, make sure we haven't
slowed that down.
> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
> Key: SOLR-6349
> URL: https://issues.apache.org/jira/browse/SOLR-6349
> Project: Solr
> Issue Type: Sub-task
> Reporter: Hoss Man
> Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch,
> SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch,
> SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch,
> SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch,
> SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl,
> make-data-and-queries.pl
>
>
> Stats component currently computes all stats (except for one) every time
> because they are relatively cheap, and in some cases dependent on eachother
> for distrib computation -- but if we start layering stats on other things it
> becomes unnecessarily expensive to compute all the stats when they just want
> the "sum" (and it will definitely become excessively verbose in the
> responses).
> The plan here is to use local params to make this configurable. All of the
> existing stat options could be modeled as a simple boolean param, but future
> params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]