[jira] [Commented] (NUMBERS-142) Improve LinearCombination accuracy during summation of the round-off errors

Alex Herbert (Jira) Mon, 20 Jan 2020 14:17:05 -0800


    [ 
https://issues.apache.org/jira/browse/NUMBERS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019757#comment-17019757
 ]


Alex Herbert commented on NUMBERS-142:
--------------------------------------

I have implemented various algorithms for extended precision and tested speed 
and accuracy. It is no surprise there is a trade-off with faster methods being 
less accurate.
h2. Algorithms tested
 # Dekker's dot product as described in Dekker, 1971. 
 # Ogita's fast variant of dot2 (dot2s)
 # Ogita's fast variant of dot2 using a binary split of the double into high 
and low parts (dot2sMask)
 # Ogita's dotK for K in 3 to 7
 # Shewchuk's method for arbitrary precision float numbers
 # BigDecimal

h2. Testing using a poorly conditioned dot-product

I implemented the method from Ogita et al to create a poorly conditioned dot 
product. This creates random numbers with a carefully chosen upper bounds for 
half of the arrays x and y. The remaining half is then created to gradually 
bring the sum back towards zero. The final dot product of x and y is in the 
range [-1, 1] and components may be up to 2^b/2 in magnitude where b is the 
logarithm of the condition number.

The method uses a target condition number to set bounds when creating the data. 
The final condition number (used for analysis) is defined as:
{noformat}
d = exact(x, y)
c = 2 * |x|^T * |y| / |d|
{noformat}
This require the exact dot product. I compute this using BigDecimal. The 
numerator is the dot product in standard precision of the absolute vectors. 
Higher means the magnitudes of the product components are further away from the 
magnitude of the dot sum and thus there is more cancellation during the sum.

Performance is assessed using relative error:
{noformat}
relative error = |observed - expected| / |expected|
{noformat}
This is clipped to a maximum of 2. Above that is meaningless, i.e the method is 
just wrong.
h2. Accuracy

To test the implementations I recreated Figure 6.2 to 6.4 from Ogita et al 
(2005). This plots the relative error against the condition number. I used 2000 
samples of length 100 vectors with a condition number of 1e10 to 1e120. At a 
certain condition number various methods can be seen to fail as shown below:

!error_vs_condition_no.jpg!
 * BigDecimal is not plotted as it is the reference
 * Shewchuk's method cannot be seen as it is exact (DotExact). This holds with 
a condition number up to 1e300.
 * Dekker's method can be seen marginally outperforming the other dot2 methods. 
It is nowhere close to dot3.
 * The variant of dot2s using a different split (Dot2sMask) cannot be 
distinguished in this test.

h2. Performance

Tested using 1000 random vectors of different lengths. The median of 5 JMH runs 
is shown:

!array_performance.jpg!
 * BigDecimal is slow
 * Shewchuk's method is about the same speed as dot6 or dot7. It is about 3.5 
slower than the speed of dot2s (the current implementation in 
LinearCombination).
 * The single precision scalar product (baseline) is about 8x faster than dot2s

h2. Inline Performance

I wrote a few different versions using the inlining of the dot product for 
small vectors. These were tested to ensure they return the same values as the 
generic array versions:

!inline_perfomance.jpg!
 * BigDecimal is not plotted as it is so slow
 * Shewchuk's method is close in speed to dot3 for length 2. The gaps grows for 
length 3 and 4.
 * The single precision scalar product is about 4-8x faster than dot2sMask
 * The simple masking approach (Mask suffix) which requires no branch 
conditions is definitely faster than using Dekker's split

The same data in a table. Note the slow speed of BigDecimal.
||Length||Baseline||BigDecimal||Dekker||Dot2s||Dot3||DotExact||Dot2sMask||Dot3Mask||
|2|2836.9|4786548.5|10386.0|10161.6|14985.9|17526.6|8045.9|12310.1|
|3|2896.1|5705362.5|20570.7|14778.2|23543.6|41584.1|12100.9|19618.6|
|4|3282.5|6609295.1|35116.4|24626.8|32823.7|64874.2|18174.7|25627.6|
h2. Discussion
 * BigDecimal is slow
 * The accuracy of BigDecimal can be achieved using Shewchuk's extended 
precision method
 * All double length precision methods are approximately equal.
 * Improving precision of dotK requires iterations of the summation (the 
parameter K)
 * Any implementation of dotK will have a failure threshold unless K is very 
large. At this point Shewchuk's method is faster and more accurate

The implementation of Shewchuk's method requires that an input of length n may 
be expanded to an arbitrary precision float represented as up to 2n double 
values in increasing order of magnitude. These values are non-overlapping. Each 
addition of a new product to the expanded number of size m requires 2m split 
sum operations and increases the length by 2. Thus the algorithm should have 
order(n^2) runtime complexity. However some additions occur with no round-off 
and thus the round-off does not have to be stored. This occurs in part as the 
numbers are non-overlapping. Thus if all the sums are approximately the same 
magnitude then the extended representation does not have to cover a large range 
of exponents. This restricts growth of the expanded arbitrary precision number 
and allows the runtime to be reasonable for long dot products.

Here is some performance on a large array:
||length||Method||Score||Error||Median||Relative Median||
|100000|Dot3|1359713254.6|30637656.4|1363053133.0|1|
|100000|Dot3b|1227819794.8|43882171.3|1226846834.0|0.90|
|100000|Dot6|2533720036.4|75611431.2|2529107680.0|1.86|
|100000|DotExact|2679235887.8|84136741.0|2673073237.0|1.96|

Thus is appears that the method will not dramatically slow down on larger 
arrays. In this case it is only 2x the speed of dot3.

Note however that this may be affected by the data and thus I will do some 
testing with different types of data.
h2. Conclusions

If the purpose of LinearCombination is to provide an accurate dot product 
without using BigDecimal then the method of Shewchuk appears to be the one to 
use. The other methods all have a point at which they fail to produce accurate 
results.

It the purpose is a compromise between speed and accuracy then the decision on 
which implementation is then down to which K level to choose for dotK.

I do not have a requirement for exact dot products of great length. For short 
lengths the exact method of Shewchuk can be implemented fairly efficiently. I 
have used this method in the numbers Complex class for a small dot sum.

It may be useful to obtain use cases from commons geometry where 
LinearCombination is also used.

 

 

> Improve LinearCombination accuracy during summation of the round-off errors
> ---------------------------------------------------------------------------
>
>                 Key: NUMBERS-142
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-142
>             Project: Commons Numbers
>          Issue Type: Improvement
>          Components: arrays
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Minor
>         Attachments: array_performance.jpg, error_vs_condition_no.jpg, 
> inline_perfomance.jpg
>
>
> The algorithm in LinearCombination is an implementation of dot2 from [Ogita 
> el al|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547] 
> (algorithm 5.3). There is a subtle difference in that the original dot2 
> algorithm sums the round-off from the products and the round-off from the 
> product summation together. The method in LinearCombination sums them 
> separately (using an extra variable) and adds them at the end. This actually 
> improves the accuracy under conditions where the round-off is of greater sum 
> than the products, as described below.
> The dot2 algorithm suffers when the total sum is close to 0 but the 
> intermediate sum is far enough from zero that there is a large difference 
> between the exponents of summed terms and the final result. In this case the 
> sum of the round-off is more important than the sum of the products which due 
> to massive cancellation is zero. The case is important for Complex numbers 
> which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 
> such that log(1) would be ~zero yet the true logarithm is representable to 
> very high precision.
> This can be protected against by using the dotK algorithm of Ogita et al with 
> K>2. This saves all the round-off parts from each product and the running 
> sum. These are subject to an error free transform that repeatedly adds 
> adjacent pairs to generate a new split pair with a closer upper and lower 
> part. Over time this will order the parts from low to high and these can be 
> summed low first for an error free dot product.
> Using this algorithm with a single pass (K=3 for dot3) removes the 
> cancellation error observed in the mentioned use case. Adding a single pass 
> over the parts changes the algorithm from 25n floating point operations 
> (flops) to 31n flops for the sum of n products.
> A second change for the algorithm is to switch to using 
> [Dekker's|https://doi.org/10.1007/BF01397083] algorithm (Dekker, 1971) to 
> split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in 
> IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to 
> be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:
> big = (2^s+1) * a
> a_hi = (big - (big - a))
> The extra bit of precision is carried into the sign of the low part of the 
> split number.
> This is in contrast to the method in LinearCombination that uses a simple 
> mask on the long representation to obtain the a_hi part in 26-bits and the 
> lower part will be 27 bits.
> The advantage of Dekker's method is it produces 2 parts with 26 bits in the 
> mantissa that can be multiplied exactly. The disadvantage is the potential 
> for overflow requiring a branch condition to check for extreme values.
> It also appropriately handles very small sub-normal numbers that would be 
> masked to create a 0 high part with all the non-zero bits left in the low 
> part using the current method. This will have knock on effects on split 
> multiplication which requires the high part to be larger.
> A simple change to the current implementation to use Dekker's split improves 
> the precision on a wide range of test data (not shown).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NUMBERS-142) Improve LinearCombination accuracy during summation of the round-off errors

Reply via email to