Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-06-03 Thread Felix Schumacher

Am 03.06.2017 um 10:55 schrieb Philippe Mouawad:

On Wed, May 31, 2017 at 2:54 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:


Philippe>- switch everywhere to R1 (also in commons-math)

Can you please clarify why do you prefer R1?


Because from what the reporter wrote, it looked good to me:
"If the 90% percentile is 1200 ms than that means that 90% of tests take no
more than 1200 ms"

And there is a another pragmatic point, it seems JOrphan implementation is
R1 once we fix the issue.




I'm inclined to R8 (as it is recommended by R for sample quantile
calculation).

1) I think interpolation would reduce run-to-run variance.
2) Interpolation-like estimation is easier to implement. For instance, if
HdrHistogram estimator is added, then its result would be closer to R8
rather than to R1.


ok



I don't think "the result of 90% is one of the sample response times" is
important. The important stuff is how system under test behaves, and it is
not something tied to a single execution. What I mean is R8 kind of
computation should better approximate the true percentile value than R1
would, and it is the true value that is important to compare and report as
a test result.


Will you submit a PR for that ? Before or after release of 3.3 ?

Is there any need to rush this before 3.3?

I still believe, that there is no exact definition for the median, but 
there are may specialised definitions like the R1 .. R8 mentioned by 
Vladimir. So I think, that whatever we chose now, will be challenged by 
someone in the future.


If we change the algorithm, we should document it well and try to make 
it so flexible, that it can be configured to act as R1 .. R8 (if that is 
possible without too much work).


If we don't change the algorithm, we should document the current state.

Felix


Thanks



Vladimir








Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-06-03 Thread Vladimir Sitnikov
Philippe>"If the 90% percentile is 1200 ms than that means that 90% of
tests take no more than 1200 ms"

Well, I get your point. It makes sense to keep the old approach unless
there's some data that confirms some other approach is better.

Vladimir>What I mean is R8 kind of
Vladimir> computation should better approximate

Philippe>Will you submit a PR for that ? Before or after release of 3.3 ?

Well, I can definitely submit one, however I'm not that fond of making a
change for a sake of change.


Vladimir


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-06-03 Thread Philippe Mouawad
On Wed, May 31, 2017 at 2:54 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Philippe>- switch everywhere to R1 (also in commons-math)
>
> Can you please clarify why do you prefer R1?
>

Because from what the reporter wrote, it looked good to me:
"If the 90% percentile is 1200 ms than that means that 90% of tests take no
more than 1200 ms"

And there is a another pragmatic point, it seems JOrphan implementation is
R1 once we fix the issue.



> I'm inclined to R8 (as it is recommended by R for sample quantile
> calculation).
>
> 1) I think interpolation would reduce run-to-run variance.
> 2) Interpolation-like estimation is easier to implement. For instance, if
> HdrHistogram estimator is added, then its result would be closer to R8
> rather than to R1.
>

ok


>
> I don't think "the result of 90% is one of the sample response times" is
> important. The important stuff is how system under test behaves, and it is
> not something tied to a single execution. What I mean is R8 kind of
> computation should better approximate the true percentile value than R1
> would, and it is the true value that is important to compare and report as
> a test result.
>

Will you submit a PR for that ? Before or after release of 3.3 ?
Thanks


>
> Vladimir
>



-- 
Cordialement.
Philippe Mouawad.


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-31 Thread Vladimir Sitnikov
Philippe>- switch everywhere to R1 (also in commons-math)

Can you please clarify why do you prefer R1?

I'm inclined to R8 (as it is recommended by R for sample quantile
calculation).

1) I think interpolation would reduce run-to-run variance.
2) Interpolation-like estimation is easier to implement. For instance, if
HdrHistogram estimator is added, then its result would be closer to R8
rather than to R1.

I don't think "the result of 90% is one of the sample response times" is
important. The important stuff is how system under test behaves, and it is
not something tied to a single execution. What I mean is R8 kind of
computation should better approximate the true percentile value than R1
would, and it is the true value that is important to compare and report as
a test result.

Vladimir


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-31 Thread Antonio Gomes Rodrigues
Hi,

I don't have time to read the posted links yet

But I am OK to have the same way to calculate percentiles and documented it

Antonio

2017-05-28 11:51 GMT+02:00 Philippe Mouawad :

> Hello,
> After reading further on this topic and also reading the different
> comments, my position would be:
> - switch everywhere to R1 (also in commons-math)
> - use the PR from contributor for the median and jorphan computations
> - document the change and algo somewhere
>
> From my understanding, tests having large results should not be affected by
> change.
>
> This would at least make computations uniform until we decide what library
> to use.
>
> I need your go before going further.
>
> If we decide for statusquo then please comment on respective bugs to
> explain to reported and contributor why we won't change anything.
>
> Regards
>
> On Tuesday, May 9, 2017, Felix Schumacher  internetallee.de>
> wrote:
>
> > Am 09.05.2017 09:11, schrieb pmouawad:
> >
> >> Github user pmouawad commented on the issue:
> >>
> >> https://github.com/apache/jmeter/pull/296
> >>
> >> Hello @abalanonline ,
> >> Thanks for your replies and explanations !
> >>
> >> I am not a math expert as you seem to be, so I have few questions
> >> you may be able to help on:
> >>
> >> 1. Thanks to your comment, I see default method is LEGACY, and the
> >> one you have created is R_1. Do you have some insights on the
> >> different method and their limits / use cases ?
> >>
> >> 2. Why does the "bug" you report affect all libraries I checked
> >> (HdrHistogram, https://github.com/tdunning/t-digest/ and JOrphan ) ?
> >> Can't it be due to a different method estimation algorithm ?
> >>
> >> Note I share your thoughts on using a dedicated library but
> >> commons-math may be overkill in terms of performance compared to
> >> HdrHistogram or t-digest.
> >>
> >
> > I have tried to do a bit of research on percentiles, quantiles and
> median.
> >
> > It looks to me, that those "points" are more like ranges, and there is no
> > exact value.
> >
> > R and numpy will interpolate the median and the percentiles/quantiles.
> The
> > statistics module
> > of python 3 has three different median implementations called median,
> > median_high and median_low,
> > that interpolate, give the highest possible median and the lowest.
> >
> > Wikipedia (the german one), gives a definition of an "Empirisches
> > Quantile" (empiric quantile),
> > where it settles on the lower border of the quantiles (and therefore the
> > median).
> >
> > I wonder if we should change our implementation at all.
> >
> > Felix
> >
> >
> >> Thanks
> >>
> >>
> >> ---
> >> If your project is set up for it, you can reply to this email and have
> >> your
> >> reply appear on GitHub as well. If your project does not have this
> feature
> >> enabled and wishes so, or if the feature is enabled but not working,
> >> please
> >> contact infrastructure at infrastruct...@apache.org or file a JIRA
> ticket
> >> with INFRA.
> >> ---
> >>
> >
>
> --
> Cordialement.
> Philippe Mouawad.
>


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-28 Thread Philippe Mouawad
Hello,
After reading further on this topic and also reading the different
comments, my position would be:
- switch everywhere to R1 (also in commons-math)
- use the PR from contributor for the median and jorphan computations
- document the change and algo somewhere

>From my understanding, tests having large results should not be affected by
change.

This would at least make computations uniform until we decide what library
to use.

I need your go before going further.

If we decide for statusquo then please comment on respective bugs to
explain to reported and contributor why we won't change anything.

Regards

On Tuesday, May 9, 2017, Felix Schumacher 
wrote:

> Am 09.05.2017 09:11, schrieb pmouawad:
>
>> Github user pmouawad commented on the issue:
>>
>> https://github.com/apache/jmeter/pull/296
>>
>> Hello @abalanonline ,
>> Thanks for your replies and explanations !
>>
>> I am not a math expert as you seem to be, so I have few questions
>> you may be able to help on:
>>
>> 1. Thanks to your comment, I see default method is LEGACY, and the
>> one you have created is R_1. Do you have some insights on the
>> different method and their limits / use cases ?
>>
>> 2. Why does the "bug" you report affect all libraries I checked
>> (HdrHistogram, https://github.com/tdunning/t-digest/ and JOrphan ) ?
>> Can't it be due to a different method estimation algorithm ?
>>
>> Note I share your thoughts on using a dedicated library but
>> commons-math may be overkill in terms of performance compared to
>> HdrHistogram or t-digest.
>>
>
> I have tried to do a bit of research on percentiles, quantiles and median.
>
> It looks to me, that those "points" are more like ranges, and there is no
> exact value.
>
> R and numpy will interpolate the median and the percentiles/quantiles. The
> statistics module
> of python 3 has three different median implementations called median,
> median_high and median_low,
> that interpolate, give the highest possible median and the lowest.
>
> Wikipedia (the german one), gives a definition of an "Empirisches
> Quantile" (empiric quantile),
> where it settles on the lower border of the quantiles (and therefore the
> median).
>
> I wonder if we should change our implementation at all.
>
> Felix
>
>
>> Thanks
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have
>> your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working,
>> please
>> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>>
>

-- 
Cordialement.
Philippe Mouawad.


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-09 Thread Philippe Mouawad
Hi Felix,
Thanks for this precious information.

Maybe we should document what option was taken by JOrphan if you know it.

On another side, do you agree we should make percentiles / median uniform
accross JMeter ?
It seems we have at least those choices:

   - commons-math we already use in BackendListener and Web Report
   - https://github.com/HdrHistogram/HdrHistogram
   - https://github.com/tdunning/t-digest/

I think the 2 latest take more into accound performance and memory usage
than first one.

Regards
Philippe


On Tue, May 9, 2017 at 9:21 AM, Felix Schumacher <
felix.schumac...@internetallee.de> wrote:

> Am 09.05.2017 09:11, schrieb pmouawad:
>
>> Github user pmouawad commented on the issue:
>>
>> https://github.com/apache/jmeter/pull/296
>>
>> Hello @abalanonline ,
>> Thanks for your replies and explanations !
>>
>> I am not a math expert as you seem to be, so I have few questions
>> you may be able to help on:
>>
>> 1. Thanks to your comment, I see default method is LEGACY, and the
>> one you have created is R_1. Do you have some insights on the
>> different method and their limits / use cases ?
>>
>> 2. Why does the "bug" you report affect all libraries I checked
>> (HdrHistogram, https://github.com/tdunning/t-digest/ and JOrphan ) ?
>> Can't it be due to a different method estimation algorithm ?
>>
>> Note I share your thoughts on using a dedicated library but
>> commons-math may be overkill in terms of performance compared to
>> HdrHistogram or t-digest.
>>
>
> I have tried to do a bit of research on percentiles, quantiles and median.
>
> It looks to me, that those "points" are more like ranges, and there is no
> exact value.
>
> R and numpy will interpolate the median and the percentiles/quantiles. The
> statistics module
> of python 3 has three different median implementations called median,
> median_high and median_low,
> that interpolate, give the highest possible median and the lowest.
>
> Wikipedia (the german one), gives a definition of an "Empirisches
> Quantile" (empiric quantile),
> where it settles on the lower border of the quantiles (and therefore the
> median).
>
> I wonder if we should change our implementation at all.
>
> Felix
>
>
>
>> Thanks
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have
>> your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working,
>> please
>> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>>
>


-- 
Cordialement.
Philippe Mouawad.


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-09 Thread Vladimir Sitnikov
Felix>I wonder if we should change our implementation at all.

So do I.
I wish JMeter would just throw an error when user tries to calculate 90%
percentile out of 5 values =)

Felix>Note I share your thoughts on using a dedicated library but
Felix> commons-math may be overkill in terms of performance compared to
Felix> HdrHistogram

I agree HdrHistogram might be the only way to compute high percentiles with
sane amount of memory.

Felix>R and numpy will interpolate the median and the percentiles/quantiles.

Technically speaking, R has 9 types of quantile calculation:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html

There's a comment:

R.quantile.doc>Further details are provided in Hyndman and Fan (1996) who
recommended type 8. The default method is type 7, as used by S and by R <
2.0.0.

As far as I understand that, "type 8" is somewhat better, however R
defaults to type 7 for backward compatibility reasons.

Here's what R version 3.4.0 (2017-04-21) produces:

quantile(c(15, 20, 35, 40, 50), c(0.05, 0.3, 0.4, 0.5, 1.0))
  5%  30%  40%  50% 100%
  16   23   29   35   50

quantile(c(15, 20, 35, 40, 50), c(0.05, 0.3, 0.4, 0.5, 1.0), type=8)
  5%  30%  40%  50% 100%
15.0 19.7 27.0 35.0 50.0


Vladimir


Re: [GitHub] jmeter issue #296: Bug 61078 - Percentile calculation error

2017-05-09 Thread Felix Schumacher

Am 09.05.2017 09:11, schrieb pmouawad:

Github user pmouawad commented on the issue:

https://github.com/apache/jmeter/pull/296

Hello @abalanonline ,
Thanks for your replies and explanations !

I am not a math expert as you seem to be, so I have few questions
you may be able to help on:

1. Thanks to your comment, I see default method is LEGACY, and the
one you have created is R_1. Do you have some insights on the
different method and their limits / use cases ?

2. Why does the "bug" you report affect all libraries I checked
(HdrHistogram, https://github.com/tdunning/t-digest/ and JOrphan ) ?
Can't it be due to a different method estimation algorithm ?

Note I share your thoughts on using a dedicated library but
commons-math may be overkill in terms of performance compared to
HdrHistogram or t-digest.


I have tried to do a bit of research on percentiles, quantiles and 
median.


It looks to me, that those "points" are more like ranges, and there is 
no exact value.


R and numpy will interpolate the median and the percentiles/quantiles. 
The statistics module
of python 3 has three different median implementations called median, 
median_high and median_low,

that interpolate, give the highest possible median and the lowest.

Wikipedia (the german one), gives a definition of an "Empirisches 
Quantile" (empiric quantile),
where it settles on the lower border of the quantiles (and therefore the 
median).


I wonder if we should change our implementation at all.

Felix



Thanks


---
If your project is set up for it, you can reply to this email and have 
your
reply appear on GitHub as well. If your project does not have this 
feature
enabled and wishes so, or if the feature is enabled but not working, 
please
contact infrastructure at infrastruct...@apache.org or file a JIRA 
ticket

with INFRA.
---