[ 
https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482171#comment-13482171
 ] 

Shawn Heisey commented on SOLR-1972:
------------------------------------

After poking around a lot looking for a way to bump the reservoir size, I 
finally came across the paper on reservoir sampling by Vitter.  After even more 
poking around, I think I get it now.  Their small reservoir apparently really 
does give statistically relevant results over millions or billions of total 
samples.  If it didn't give them numbers they could use, they would have 
already made it larger.

Do you think it's worthwhile to give people the ability to customize the 
percentile list -- turn some of the standard percentiles off, and/or add custom 
ones?  As soon as we conclude that including the full predefined set won't 
present a performance problem because it only gets calculated when the admin 
GUI is accessed, there'll be someone who has created hundreds of request 
handlers and polls the statistics for all of them once a minute.  I can also 
see someone wanting to see the 12th and 87th percentiles for some reason 
neither of us can fathom, but makes perfect sense to them.

                
> Need additional query stats in admin interface - median, 95th and 99th 
> percentile
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1972
>                 URL: https://issues.apache.org/jira/browse/SOLR-1972
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4
>            Reporter: Shawn Heisey
>            Priority: Minor
>         Attachments: elyograg-1972-3.2.patch, elyograg-1972-3.2.patch, 
> elyograg-1972-trunk.patch, elyograg-1972-trunk.patch, 
> SOLR-1972-branch3x-url_pattern.patch, SOLR-1972-branch4x.patch, 
> SOLR-1972-branch4x.patch, SOLR-1972_metrics.patch, SOLR-1972.patch, 
> SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972-url_pattern.patch
>
>
> I would like to see more detailed query statistics from the admin GUI.  This 
> is what you can get now:
> requests : 809
> errors : 0
> timeouts : 0
> totalTime : 70053
> avgTimePerRequest : 86.59209
> avgRequestsPerSecond : 0.8148785 
> I'd like to see more data on the time per request - median, 95th percentile, 
> 99th percentile, and any other statistical function that makes sense to 
> include.  In my environment, the first bunch of queries after startup tend to 
> take several seconds each.  I find that the average value tends to be useless 
> until it has several thousand queries under its belt and the caches are 
> thoroughly warmed.  The statistical functions I have mentioned would quickly 
> eliminate the influence of those initial slow queries.
> The system will have to store individual data about each query.  I don't know 
> if this is something Solr does already.  It would be nice to have a 
> configurable count of how many of the most recent data points are kept, to 
> control the amount of memory the feature uses.  The default value could be 
> something like 1024 or 4096.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to