[ 
https://issues.apache.org/jira/browse/NUMBERS-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851712#comment-17851712
 ] 

Alex Herbert commented on NUMBERS-206:
--------------------------------------

h2. Single k selection

Note that development started with a method based on the Commons Math 
Percentile implementation of quickselect. This was changed to allow multiple 
indices to be passed in a single method call for the first generation functions 
which use a variety of partition algorithms.

This was then updated in a second development generation to add more options to 
configure the algorithm for multiple indices. Only one implementation was 
provided using the SBM partition algorithm.

The introselect based algorithms allow configuration the single/dual-pivot 
partition algorithm and the introspection. Typically introspection does not 
detect the need to switch to the alternative (stopper) algorithm. Here the 
stopper is the QuickselectAdaptive algorithm.

Various linear select algorithms are provided, broadly grouped into a, b, or c 
based on their implementation complexity.

Note: The algorithms are changed by using parameters harvested from the name. 
The default name is the base configuration for the method. This can be changed 
by appending parameters to the name as is done for some introselect 
implementations that can use FR sub-sampling.
||Name||Generation||Description||
|SortJDK| |Sort the data using JDK's Arrays.sort.|
|SPH|0|Single-pivot using a heap to cache pivots. This is the method used in 
the Commons Math Percentile implementation. The cache is null on a single index 
and ignored. Uses a single index input.|
|SP|1|Single-pivot using binary partitioning.|
|BM|1|Single-pivot using Bentley-McIlroy ternary partitioning. Main comparators 
are <=, >=.|
|SBM|1|Single-pivot using Sedgewick's Bentley-McIlroy ternary partitioning. 
Main comparators are <, >.|
|DP|1|Dual-pivot using Yaroslavskiy's original 1/div for pivot selection. The 
divisor (div) is initialised at 3 and is increased when the central partition 
is too large.|
|5DP|1|Dual-pivot using 2nd and 4th positions from 5 sorted points as the 
pivot. The 5 points are sampled evenly from the range.|
|2SBM|2|2nd generation single-pivot using Sedgewick's Bentley-McIlroy ternary 
partitioning. This is the most consistently performing single pivot partition 
method.|
|ISP|3|Single-pivot introselect. Pivot chosen using the dynamic strategy of BM. 
Above 40 the ninther strategy of a median of 3 medians-of-3 from 9 points; 
otherwise use a median-of-3.|
|IDP|3|Dual-pivot introselect. Pivot using 2 and 4 of 5 sorted points.|
|LSP|a|Median-of-medians using medians of 5. Sample is placed on the left side 
of the data.|
|Linear BFPRT IM|b|Median-of-medians using medians of 5. Sample is placed in 
the centre of the data.|
|Linear BFPRTA|b|Median-of-medians using medians of 5. Sample is placed in the 
centre of the data. The pivot is chosen using an adaptive k.|
|Linear RS IM|b|Repeated-step. Sample is placed in the centre of the data.|
|Linear RSA|b|Repeated-step. Sample is placed in the centre of the data. The 
pivot is chosen using an adaptive k.|
|QA_CF16|c|QuickselectAdaptive original method. Uses adaptive k for all steps.|
|QA|c|QuickselectAdaptive uses the original far step method but updates 
adaption of k for the far step to reduce the probability of placing the target 
in the larger partition.|
|QA_CF8|c|QuickselectAdaptive using the far step 2 method. Uses adaptive k for 
all steps.|
|FR| |Single-pivot using Floyd-Rivest sub-sampling when the range is >600.|
|KFR| |Kiwiel's dual-pivot using Floyd-Rivest sub-sampling when the range is 
>600.|
|ISP_SU600_PAIRED_KEYS|3|Single-pivot introselect. Pivot chosen using the 
dynamic strategy of BM. Uses FR sub-sampling when the range >600.|
|ISP_SU1200_PAIRED_KEYS|3|Single-pivot introselect. Pivot chosen using the 
dynamic strategy of BM. Uses FR sub-sampling when the range >1200.|
|ISP_SU1200_PAIRED_KEYS_CF2|3|Single-pivot introselect. Pivot chosen using the 
dynamic strategy of BM. Uses FR sub-sampling with a random sample when the 
range >1200.|
|SELECT| |Commons numbers arrays selection implementation. Uses FR sub-sampling 
and falls back to QuickselectAdaptive if FR fails to hit the expected margins.|

Note: The PAIRED_KEYS name is required to activate a method that uses FR 
sub-sampling. The default method uses a collections of keys with only 1 key. 
This method is for selecting multiple keys together. As such it is not 
optimised for FR sub-sampling which must target a single key.
h2. Results

Warning: Results may vary by machine. The current code has settings that 
perform well over several machines. One key development goal was that speed 
should not be obviously slow; rather than the optimum.

Tested on MacOSX 14.5; M2 Max cpu with 96Gb RAM; with Eclipse Adoptium JDK 
21.0.3.

BM test data; range [50000, 50001]; n=984.

Selection of a single k in 10 samples. 10 uniformly spaced indices are 
generated in the range. Each index is one sample. The order of the 10 samples 
are permuted each benchmark invocation.
|*Name*|*Score*|*Error*|
|DP|3.12E+10|2.89E+09|
|SortJDK|6.22E+09|2.64E+08|
|LSP|3.99E+09|6.37E+07|
|Linear_BFPRT_IM|2.94E+09|1.39E+08|
|Linear_BFPRTA|2.45E+09|5.50E+07|
|Linear_RS_IM|2.42E+09|9.16E+07|
|Linear_RSA|2.27E+09|9.44E+07|
|SP|1.89E+09|5.30E+07|
|SPH|1.87E+09|5.66E+07|
|SBM|1.74E+09|1.43E+08|
|2SBM|1.70E+09|1.01E+08|
|BM|1.69E+09|7.11E+07|
|ISP|1.66E+09|1.38E+08|
|QA_CF16|1.62E+09|7.34E+07|
|QA|1.51E+09|5.45E+07|
|5DP|1.49E+09|2.76E+07|
|QA_CF8|1.47E+09|6.10E+07|
|IDP|1.47E+09|6.50E+07|
|ISP_SU600_PAIRED_KEYS|1.44E+09|5.43E+07|
|ISP_SU1200_PAIRED_KEYS|1.42E+09|6.25E+07|
|KFR|1.41E+09|5.10E+07|
|SELECT|1.40E+09|6.53E+07|
|FR|1.37E+09|1.29E+08|
|ISP_SU1200_PAIRED_KEYS_CF2|1.28E+09|1.31E+08|
h2. Observations

Sorting the entire data is slower than selection.

The DP method is slower then a sort. This is an example where the method has 
had quadratic quickselect performance on at least one of the data samples. It 
is a demonstration that the method must be robust to all types of input data. 
Here are the two DP methods benchmarked on only random data:
{noformat}
DP      41881485.087 ± 2467338.158  ns/op
5DP     43475788.348 ± 2617480.169  ns/op{noformat}
Here the DP method is within error of the alternative 5DP (which only changes 
the pivot strategy).

The improved version of the linear select algorithm (with a central layout of 
the pivot sample) is faster than the version which place the sample on the left 
(LSP vs Linear_BFPRT_IM). Here the only difference is the sample layout; the 
computed pivot for the partition will be the same.

The adaptive positioning of k within the sample improves performance.

The repeated step algorithm is faster than median-of-medians.

The slowest quickselect is the original single-pivot quickselect adapted from 
Commons Math (SP; SPH). Switching to ternary partitioning methods (BM, SBM) 
improves performance. Note that the BM data creates many samples with a high 
number of repeat elements. Thus partitioning where values equal to the pivot 
are identified is an advantage.

There is no difference between the first and second generation selection 
methods using the same partition algorithm (SBM vs 2SBM). However when 
introspection is added so the algorithm can detect poor quickselect performance 
(ISP) there is a small improvement in speed. This may be within the error 
margin, or could be an observation of introspection working.

The linear select algorithms are slower than quickselect with a simple pivot 
strategy. However the QuickselectAdaptive methods are faster. The original 
QA_CF16 method is slowest; improvements to the adaption of k for the far step 
(QA); and then the change to the far step 2 method (QA_CF8) increase 
performance.

The dual-pivot introselect is faster on the BM data than the single-pivot 
introselect. However note that the difference is an average. On some data the 
ISP is faster; on other the IDP. The relative speed is also dependent on the 
location k in the data, e.g. 10%, 25%, 50%. This benchmark uses a uniform 
selection of k and is thus representative of any k location.

The single-pivot methods are improved by using Floyd-Rivest sampling. The 
original FR method is very fast. This uses a sub-sampling size of 600. The ISP 
based methods use 1200. This value was arrived at by benchmarking on various 
data lengths. The sub-sampling size can be gradually increased until a 
noticeable difference in performance between the ISP and ISP+FR is observed. 
The higher value indicates that the pivot selection strategy of the ISP method 
(using the BM dynamic selection) is more robust than the original FR pivot 
selection (which simply uses the target k). Note that FR is faster than 
ISP_SU1200_PAIRED_KEYS although they are performing similar pivot selection; 
differences are the sub-sampling size; binary vs ternary partitioning; and 
introspection, any of which may be changing the performance.

The KFR method is slightly slower than the original FR method. Adding random 
sub-sampling to FR increases performance to create the fastest result 
(ISP_SU1200_PAIRED_KEYS_CF2).

The SELECT method is as fast as the FR methods. It is not as fast as the FR 
method with random sub-sampling. However it will be stable for repeat input as 
it will not move data out of sorted order, and will have guaranteed linear 
performance on any input data as introspection will detect bad partitions and 
revert to QuickselectAdaptive.

The performance gain from using a FR-based algorithm can be observed using 
large random data, for example 50 million uniform random element (10 repeats 
with a random k):
{noformat}
                    SP  4295643800.200 ±  988383992.147  ns/op
                QA_CF8  3088731608.400 ± 1060959744.803  ns/op
                    FR  1661633891.600 ± 1069784031.796  ns/op
                SELECT  1868087800.000 ± 1157148907.486  ns/op
ISP_SU1200_PAIRED_KEYS  1751146783.400 ± 1166364039.049  ns/op
{noformat}
Here the FR algorithm is fastest but the other FR-based methods are close given 
the error margins. The fastest QuickselectAdaptive configuration is 
approximately in between the FR methods and a standard quickselect 
implementation.

 

> Selection API
> -------------
>
>                 Key: NUMBERS-206
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-206
>             Project: Commons Numbers
>          Issue Type: New Feature
>          Components: arrays
>            Reporter: Alex Herbert
>            Priority: Major
>
> Create a selection API to select the k-th largest element in an array. This 
> places at k the same value that would be at k in a fully sorted array.
> {code:java}
> public final class Selection {
>     public static void select(double[] a, int k);
>     public static void select(double[] a, int from, int to, int k);
>     public static void select(double[] a, int[] k);
>     public static void select(double[] a, int from, int to, int[] k);
>     // Extend to other primitive data types that are not easily sorted (e.g. 
> long, float, int)
> {code}
> Note: This API will support multiple points (int[] k) for use in quantile 
> estimation of array data by interpolation of neighbouring values (see 
> STATISTICS-85).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to