[jira] [Updated] (NUMBERS-142) Improve LinearCombination accuracy during summation of the round-off errors

Alex Herbert (Jira) Wed, 19 Feb 2020 06:32:11 -0800


     [ 
https://issues.apache.org/jira/browse/NUMBERS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Herbert updated NUMBERS-142:
---------------------------------
    Attachment: dot.jpg

> Improve LinearCombination accuracy during summation of the round-off errors
> ---------------------------------------------------------------------------
>
>                 Key: NUMBERS-142
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-142
>             Project: Commons Numbers
>          Issue Type: Improvement
>          Components: arrays
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Assignee: Alex Herbert
>            Priority: Minor
>         Attachments: array_performance.jpg, cond_no.jpg, dot.jpg, 
> error_vs_condition_no.jpg, inline_perfomance.jpg
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The algorithm in LinearCombination is an implementation of dot2 from [Ogita 
> el al|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547] 
> (algorithm 5.3). There is a subtle difference in that the original dot2 
> algorithm sums the round-off from the products and the round-off from the 
> product summation together. The method in LinearCombination sums them 
> separately (using an extra variable) and adds them at the end. This actually 
> improves the accuracy under conditions where the round-off is of greater sum 
> than the products, as described below.
> The dot2 algorithm suffers when the total sum is close to 0 but the 
> intermediate sum is far enough from zero that there is a large difference 
> between the exponents of summed terms and the final result. In this case the 
> sum of the round-off is more important than the sum of the products which due 
> to massive cancellation is zero. The case is important for Complex numbers 
> which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 
> such that log(1) would be ~zero yet the true logarithm is representable to 
> very high precision.
> This can be protected against by using the dotK algorithm of Ogita et al with 
> K>2. This saves all the round-off parts from each product and the running 
> sum. These are subject to an error free transform that repeatedly adds 
> adjacent pairs to generate a new split pair with a closer upper and lower 
> part. Over time this will order the parts from low to high and these can be 
> summed low first for an error free dot product.
> Using this algorithm with a single pass (K=3 for dot3) removes the 
> cancellation error observed in the mentioned use case. Adding a single pass 
> over the parts changes the algorithm from 25n floating point operations 
> (flops) to 31n flops for the sum of n products.
> A second change for the algorithm is to switch to using 
> [Dekker's|https://doi.org/10.1007/BF01397083] algorithm (Dekker, 1971) to 
> split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in 
> IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to 
> be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27:
> big = (2^s+1) * a
> a_hi = (big - (big - a))
> The extra bit of precision is carried into the sign of the low part of the 
> split number.
> This is in contrast to the method in LinearCombination that uses a simple 
> mask on the long representation to obtain the a_hi part in 26-bits and the 
> lower part will be 27 bits.
> The advantage of Dekker's method is it produces 2 parts with 26 bits in the 
> mantissa that can be multiplied exactly. The disadvantage is the potential 
> for overflow requiring a branch condition to check for extreme values.
> It also appropriately handles very small sub-normal numbers that would be 
> masked to create a 0 high part with all the non-zero bits left in the low 
> part using the current method. This will have knock on effects on split 
> multiplication which requires the high part to be larger.
> A simple change to the current implementation to use Dekker's split improves 
> the precision on a wide range of test data (not shown).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (NUMBERS-142) Improve LinearCombination accuracy during summation of the round-off errors

Reply via email to