Hello,

We've been doing fairly extensive micro-benchmark testing of the effect of 
adjusting the various algorithm crossover thresholds in BigInteger (cf. 
https://bugs.openjdk.java.net/browse/JDK-8022181). There'll be another post on 
that particular topic in the near future. This testing did however expose 
another topic for which an issue is not yet (but likely will be) on file, i.e., 
the heuristic for determining when to use Burnikel-Ziegler divide and conquer 
division instead of "standard" division.

At present the test used is

        if divisor_int_length < BZ_THRESHOLD or dividend_int_length < 
BZ_THRESHOLD then
                use standard division
        else
                use B-Z division

(where BZ_THRESHOLD is currently 50).

Let {N,D} represent a dividend-divisor pair with the int-length of the former 
being N and that of the latter being D. The following cases were among those 
initially tested: {T,2T}, {T,T+2}, {T,T}, {T+2,T}, and {2T,T} where T is the BZ 
threshold. In all but the last case, where the dividend is twice the length of 
the divisor, a (sometimes serious) performance regression was observed.

The implementation was reviewed along with the original paper [1]. The 
implementation is accurate with respect to the paper as had already been 
verified some months ago [2]. It was observed however that something 
interesting happens for the case where the int-length of the dividend A is 
equal to that of the divisor B. For this situation, the algorithm parameter t 
(step 5 in algorithm 3 in the paper) becomes 2. This causes A to be virtually 
padded out to a 2T-length value [0,A]. The net result is that the BZ algorithm 
ends up being applied in effect to a pair {2T,T} which is significantly slower 
than using the standard algorithm on the pair {T,T}. This appears to be 
primarily due to the algorithm overhead rather than a length dependent problem. 
For the {T,T} case, the following recursive call sequence of significant 
operations ensues:

divide
        divideBZ
                divide2n1n
                        divide3n2n
                                divide2n1n
                                        divideKnuth
                                multiply
                                subtract
                        divide3n2n
                                divide2n1n
                                        divideKnuth
                                multiply
                                subtract
                add

Despite the fact that some of the method calls reduce to simple operations like 
dividing zero by non-zero, dividing something by itself, or multiplying 
something by zero or one, the algorithm overhead causes the performance to be 
worse than had this sequence been used:

divide
        divideKnuth

Through micro-benchmark testing it was discovered that, given the original 
BZ_THRESHOLD criterion is met, for some value C the BZ algorithm will start to 
outperform the standard algorithm for the case {T+C,T}, i.e., the dividend is C 
ints longer than the divisor. Testing supports the value of the offset C being 
independent of the value of the BZ threshold. This suggests that the following 
heuristic would be more appropriate for this algorithm:

        if divisor_int_length < BZ_THRESHOLD or dividend_int_length - 
divisor_int_length < BZ_OFFSET_THRESHOLD then
                use standard division
        else
                use B-Z division

Any comments on this alternative heuristic would be appreciated as I would not 
be in the least bit surprised to have missed some fine point altogether.

Thanks,

Brian

[1] Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division", 
Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022, 
http://www.mpi-sb.mpg.de/~ziegler/TechRep.ps.gz.
[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-July/018922.html

Reply via email to