[
https://issues.apache.org/jira/browse/MATH-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Stelmaszczyk updated MATH-1169:
------------------------------------
Description:
Percentile class is now using Hoare selection algorithm (aka
[Quickselect|http://en.wikipedia.org/wiki/Quickselect]) with median of 3 for
choosing a pivot and caching pivots. Quickselect expected runtime is about 3.4N
+ O(N).
The constant can be improved to 1.5N by different pivot strategy involving
sampling, yielding the [Floyd–Rivest
algorithm|http://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm], which
has average complexity of 1.5N + O(N) for median, with other position
statistics even faster.
For proof of concept I implemented Floyd–Rivest algorithm without caching
pivots following Wikipedia and benchmarked it with
[Caliper|https://code.google.com/p/caliper/].
Array size - runtime for Floyd-Rivest - runtime for current Percentile - %
faster
100000 - 6.907 - 7.083 - 2.5%
[link|https://microbenchmarks.appspot.com/runs/a0d947ee-57fc-4636-b687-b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000 - 70.3 - 75.4 - 6.8%
[link|https://microbenchmarks.appspot.com/runs/77291c2e-6dbb-4666-bf37-c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000 - 708 - 868 - 18.4%
[link|https://microbenchmarks.appspot.com/runs/c0f65e35-44c0-458e-b6e0-634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
In attachments:
{{Main.java}} should be run to repeat experiments, it needs Caliper JAR:
https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
Every call to evaluate() I'm filling the input double[] array with different
random numbers.
{{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode
from its Wikipedia page.
{{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so
it calls FloydRivest.select() instead of typical select(). pivotsHeap was
removed for simplicity (maybe it can improve efficiency even more).
was:
Percentile class is now using Hoare selection algorithm (aka
[Quickselect|http://en.wikipedia.org/wiki/Quickselect]) with median of 3 for
choosing a pivot and caching pivots. Quickselect expected runtime is about 3.4N
+ O(N).
The constant can be improved to 1.5N by different pivot strategy involving
sampling, yielding the [Floyd–Rivest
algorithm|http://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm], which
has average complexity of 1.5N + O(N) for median, with other position
statistics even faster.
For proof of concept I implemented Floyd–Rivest algorithm without caching
pivots following Wikipedia and benchmarked it with
[Caliper|https://code.google.com/p/caliper/].
Array size - runtime in seconds for Floyd-Rivest - runtime for current
Percentile - % faster
100000 - 6.907 - 7.083 - 2.5%
[link|https://microbenchmarks.appspot.com/runs/a0d947ee-57fc-4636-b687-b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
1000000 - 70.3 - 75.4 - 6.8%
[link|https://microbenchmarks.appspot.com/runs/77291c2e-6dbb-4666-bf37-c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
10000000 - 708 - 868 - 18.4%
[link|https://microbenchmarks.appspot.com/runs/c0f65e35-44c0-458e-b6e0-634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
In attachments:
{{Main.java}} should be run to repeat experiments, it needs Caliper JAR:
https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
Every call to evaluate() I'm filling the input double[] array with different
random numbers.
{{FloydRivest.java}} contatins implementation of algorithm basing on pseudocode
from its Wikipedia page.
{{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), so
it calls FloydRivest.select() instead of typical select(). pivotsHeap was
removed for simplicity (maybe it can improve efficiency even more).
> Faster Percentile
> -----------------
>
> Key: MATH-1169
> URL: https://issues.apache.org/jira/browse/MATH-1169
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Adam Stelmaszczyk
> Priority: Minor
> Attachments: FloydRivest.java, Main.java, PercentileFloydRivest.java
>
>
> Percentile class is now using Hoare selection algorithm (aka
> [Quickselect|http://en.wikipedia.org/wiki/Quickselect]) with median of 3 for
> choosing a pivot and caching pivots. Quickselect expected runtime is about
> 3.4N + O(N).
> The constant can be improved to 1.5N by different pivot strategy involving
> sampling, yielding the [Floyd–Rivest
> algorithm|http://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm], which
> has average complexity of 1.5N + O(N) for median, with other position
> statistics even faster.
> For proof of concept I implemented Floyd–Rivest algorithm without caching
> pivots following Wikipedia and benchmarked it with
> [Caliper|https://code.google.com/p/caliper/].
> Array size - runtime for Floyd-Rivest - runtime for current Percentile - %
> faster
> 100000 - 6.907 - 7.083 - 2.5%
> [link|https://microbenchmarks.appspot.com/runs/a0d947ee-57fc-4636-b687-b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
> 1000000 - 70.3 - 75.4 - 6.8%
> [link|https://microbenchmarks.appspot.com/runs/77291c2e-6dbb-4666-bf37-c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
> 10000000 - 708 - 868 - 18.4%
> [link|https://microbenchmarks.appspot.com/runs/c0f65e35-44c0-458e-b6e0-634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
> In attachments:
> {{Main.java}} should be run to repeat experiments, it needs Caliper JAR:
> https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
> Every call to evaluate() I'm filling the input double[] array with different
> random numbers.
> {{FloydRivest.java}} contatins implementation of algorithm basing on
> pseudocode from its Wikipedia page.
> {{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(),
> so it calls FloydRivest.select() instead of typical select(). pivotsHeap was
> removed for simplicity (maybe it can improve efficiency even more).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)