[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6349:
---------------------------
    Attachment: make-data-and-queries.pl
                SOLR-6349.patch

Well, after wrapping up SOLR-7171, i came back to this patch and started making 
the following improvements...

{panel:title=changes in this patch}
* TestDistributedSearch
** beefed up inspection of shard responses now that SOLR-7171 is fixed
* StatsValuesFactory
** introduce new final booleans to track the stats we want to encourage JIT 
optimization
** fix some inconsistencies in when/if various stats are returned to clients 
depending on values seen (notably with Dates before the epoch)
* StatsComponentTest
** testFieldStatisticsResultsDateFieldAlwaysMissing
*** the set of stats that we've returned for Dates in the past has been 
inconsistent depending on if there were "non-missing" docs ... updated this 
test to expect everything
{panel}

...after making these changes, i went to re-run that mini-benchmark i wrote 
before, and in looking at it realized i made a serious conceptual mistake...

{quote}
* user currently asks for stats on fields, only cares about 4of8 of them
...
...the sequence of stat field requests are identicle between the 2 bash files, 
but in one URLs include localparams to only compute min/max/mean/stddev for the 
field. 
{quote}

...the problem being that because these are distributed requests, the new style 
requests still require that sum, count & sumOfSquares be computed on every 
shard (in addition to the min & max) ... the *final* responses to the query 
client are smaller, but we're still computing virtually the same amount of math 
calculations and each shard is returning virtually the same amount of data.  
(the only calculation "skipped" is the "missing" stat which is actually 
irrelevant in this test because every doc has a value in every field)

So i did a quick tweak to make-data-and-queries.pl so that the only stats 
requested are "min" & "mean" -- so the shards only have to compute min, sum, 
count.  With that change, and the new patch, the numbers look much better...

{noformat}
        pre-patch        with patch       with patch 
Run #   old style        old style        new style
        (all stats)      (all stats)      (2 stats, 3 deps)
  1      135.7 sec        134.4 sec        109.9 sec
  2      130.1 sec        132.4 sec        109.0 sec
  3      132.3 sec        132.8 sec        108.2 sec
total    398.1 sec        399.6 sec        327.1 sec
                        00.3 % slower    17.8 % faster
{noformat}

...so these numbers are a lot more promising.

this also makes me want to run some more perf tests, on more permutations of 
stats - in particular i want to check the non-cloud mode, make sure we haven't 
slowed that down.


> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
>                 Key: SOLR-6349
>                 URL: https://issues.apache.org/jira/browse/SOLR-6349
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Hoss Man
>         Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch, 
> SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch, 
> SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch, 
> SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, 
> SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl, 
> make-data-and-queries.pl
>
>
> Stats component currently computes all stats (except for one) every time 
> because they are relatively cheap, and in some cases dependent on eachother 
> for distrib computation -- but if we start layering stats on other things it 
> becomes unnecessarily expensive to compute all the stats when they just want 
> the "sum" (and it will definitely become excessively verbose in the 
> responses).  
> The plan here is to use local params to make this configurable.  All of the 
> existing stat options could be modeled as a simple boolean param, but future 
> params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to