Hello,
We've been doing fairly extensive micro-benchmark testing of the effect of
adjusting the various algorithm crossover thresholds in BigInteger (cf.
https://bugs.openjdk.java.net/browse/JDK-8022181). There'll be another post on
that particular topic in the near future. This testing did however expose
another topic for which an issue is not yet (but likely will be) on file, i.e.,
the heuristic for determining when to use Burnikel-Ziegler divide and conquer
division instead of "standard" division.
At present the test used is
if divisor_int_length < BZ_THRESHOLD or dividend_int_length <
BZ_THRESHOLD then
use standard division
else
use B-Z division
(where BZ_THRESHOLD is currently 50).
Let {N,D} represent a dividend-divisor pair with the int-length of the former
being N and that of the latter being D. The following cases were among those
initially tested: {T,2T}, {T,T+2}, {T,T}, {T+2,T}, and {2T,T} where T is the BZ
threshold. In all but the last case, where the dividend is twice the length of
the divisor, a (sometimes serious) performance regression was observed.
The implementation was reviewed along with the original paper [1]. The
implementation is accurate with respect to the paper as had already been
verified some months ago [2]. It was observed however that something
interesting happens for the case where the int-length of the dividend A is
equal to that of the divisor B. For this situation, the algorithm parameter t
(step 5 in algorithm 3 in the paper) becomes 2. This causes A to be virtually
padded out to a 2T-length value [0,A]. The net result is that the BZ algorithm
ends up being applied in effect to a pair {2T,T} which is significantly slower
than using the standard algorithm on the pair {T,T}. This appears to be
primarily due to the algorithm overhead rather than a length dependent problem.
For the {T,T} case, the following recursive call sequence of significant
operations ensues:
divide
divideBZ
divide2n1n
divide3n2n
divide2n1n
divideKnuth
multiply
subtract
divide3n2n
divide2n1n
divideKnuth
multiply
subtract
add
Despite the fact that some of the method calls reduce to simple operations like
dividing zero by non-zero, dividing something by itself, or multiplying
something by zero or one, the algorithm overhead causes the performance to be
worse than had this sequence been used:
divide
divideKnuth
Through micro-benchmark testing it was discovered that, given the original
BZ_THRESHOLD criterion is met, for some value C the BZ algorithm will start to
outperform the standard algorithm for the case {T+C,T}, i.e., the dividend is C
ints longer than the divisor. Testing supports the value of the offset C being
independent of the value of the BZ threshold. This suggests that the following
heuristic would be more appropriate for this algorithm:
if divisor_int_length < BZ_THRESHOLD or dividend_int_length -
divisor_int_length < BZ_OFFSET_THRESHOLD then
use standard division
else
use B-Z division
Any comments on this alternative heuristic would be appreciated as I would not
be in the least bit surprised to have missed some fine point altogether.
Thanks,
Brian
[1] Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
http://www.mpi-sb.mpg.de/~ziegler/TechRep.ps.gz.
[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-July/018922.html