[ https://issues.apache.org/jira/browse/NUMBERS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Herbert updated NUMBERS-142: --------------------------------- Attachment: dot.jpg > Improve LinearCombination accuracy during summation of the round-off errors > --------------------------------------------------------------------------- > > Key: NUMBERS-142 > URL: https://issues.apache.org/jira/browse/NUMBERS-142 > Project: Commons Numbers > Issue Type: Improvement > Components: arrays > Affects Versions: 1.0 > Reporter: Alex Herbert > Assignee: Alex Herbert > Priority: Minor > Attachments: array_performance.jpg, cond_no.jpg, dot.jpg, > error_vs_condition_no.jpg, inline_perfomance.jpg > > Time Spent: 20m > Remaining Estimate: 0h > > The algorithm in LinearCombination is an implementation of dot2 from [Ogita > el al|http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.1547] > (algorithm 5.3). There is a subtle difference in that the original dot2 > algorithm sums the round-off from the products and the round-off from the > product summation together. The method in LinearCombination sums them > separately (using an extra variable) and adds them at the end. This actually > improves the accuracy under conditions where the round-off is of greater sum > than the products, as described below. > The dot2 algorithm suffers when the total sum is close to 0 but the > intermediate sum is far enough from zero that there is a large difference > between the exponents of summed terms and the final result. In this case the > sum of the round-off is more important than the sum of the products which due > to massive cancellation is zero. The case is important for Complex numbers > which require a computation of log1p(x^2+y^2-1) when x^2+y^2 is close to 1 > such that log(1) would be ~zero yet the true logarithm is representable to > very high precision. > This can be protected against by using the dotK algorithm of Ogita et al with > K>2. This saves all the round-off parts from each product and the running > sum. These are subject to an error free transform that repeatedly adds > adjacent pairs to generate a new split pair with a closer upper and lower > part. Over time this will order the parts from low to high and these can be > summed low first for an error free dot product. > Using this algorithm with a single pass (K=3 for dot3) removes the > cancellation error observed in the mentioned use case. Adding a single pass > over the parts changes the algorithm from 25n floating point operations > (flops) to 31n flops for the sum of n products. > A second change for the algorithm is to switch to using > [Dekker's|https://doi.org/10.1007/BF01397083] algorithm (Dekker, 1971) to > split the number. This extracts two 26-bit mantissas from a 53-bit mantis (in > IEEE 754 the leading bit in front of the of the 52-bit mantissa is assumed to > be 1). This is done by multiplication by 2^s+1 with s = ceil(53/2) = 27: > big = (2^s+1) * a > a_hi = (big - (big - a)) > The extra bit of precision is carried into the sign of the low part of the > split number. > This is in contrast to the method in LinearCombination that uses a simple > mask on the long representation to obtain the a_hi part in 26-bits and the > lower part will be 27 bits. > The advantage of Dekker's method is it produces 2 parts with 26 bits in the > mantissa that can be multiplied exactly. The disadvantage is the potential > for overflow requiring a branch condition to check for extreme values. > It also appropriately handles very small sub-normal numbers that would be > masked to create a 0 high part with all the non-zero bits left in the low > part using the current method. This will have knock on effects on split > multiplication which requires the high part to be larger. > A simple change to the current implementation to use Dekker's split improves > the precision on a wide range of test data (not shown). -- This message was sent by Atlassian Jira (v8.3.4#803005)