On Fri, 17 Feb 2023 04:36:08 GMT, Xiaowei Lu <[email protected]> wrote:
> The pr looks promising in terms of performance. What makes sense to do:
>
> *) Don't rely on external benchmarks. It's fine if such exists, but anyway
> set of microbenchmarks (using JMH) will be much better. More clear, readable
> results, etc. E.g., it may show that other operations (for example, sqrt)
> were speeded up too.
>
> *) Find boundaries. "divideAndRemainder(bigTenToThe(scaleStep))" may produce
> non-zero reminder. Find conditions when it happens. How big is performance
> regression in such cases?
>
> Some other optimizations: *) Current code checks only the lowest bit (odd or
> even) to cut off indivisible cases. Making
> "divideAndRemainder(bigTenToThe(scaleStep))" - you make check scaleStep
> lowest bits to do cut off. See "BigInteger.getLowestSetBit()"
>
> *) BigInteger division by int value is faster. It's a special case. What is
> faster, doing the single division by bigTenToThe(scaleStep) or doing several
> divisions (split scaleStep to fit into int)? Exploration is required.
About the boundaries you have mentioned.
In short, such perf regression cases are rare, and our optimization shows 5%
regression in that case.
As you said, "divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero
remainder. To make it happen, we need a big scaleStep with a small number of
trailing zeros of intVal. Since scaleStep=scale-preferredScale, and
preferredScale = dividend.scale() - divisor.scale(), we intend for a small
scale for dividend and a big scale for divisor. Meanwhile, the division should
produce zero remainder.
We managed to find one case: 1/2^n. The remainder is always zero and scaleStep
is Big enough.
Take the following for example
MathContext mc = new MathContext(36, RoundingMode.HALF_UP);
BigDecimal divisor = BigDecimal.valueOf((long)1<<50);
BigDecimal.ONE.divide(divisor, mc);
In zero stripping, intVal is ``888178419700125232338905334472656250``, scale is
``51`` and preferredScale is ``0``.
Without our optimization, only one division is needed since intVal has just one
trailing zero.
With out optimization, scaleStep will firstly try 25, then fail, then try 12
and fail, then try 6 and fail……Just like the worst case of binary search.
We extend this case to 1/2^n(n in [50, 60)), and we see about 5% regression.
@Benchmark
public void worst_case() {
// precision of MathContext is carefully designed to make quotient have few
trailing zeros.(Up to 4)
MathContext mc1 = new MathContext(38, RoundingMode.HALF_UP);
for(long i = 50; i < 55; i++) {
BigDecimal divisor1 = BigDecimal.valueOf((long)1<<i);
BigDecimal.ONE.divide(divisor1, mc1);
}
MathContext mc2 = new MathContext(42, RoundingMode.HALF_UP);
for(long j = 55; j < 60; j++) {
BigDecimal divisor2 = BigDecimal.valueOf((long)1<<j);
BigDecimal.ONE.divide(divisor2, mc2);
}
}
Before Optimization
Benchmark Mode Cnt Score Error Units
DecimalBenchmark.worst_case thrpt 5 212396.544 ± 4124.663 ops/s
After Optimization
Benchmark Mode Cnt Score Error Units
DecimalBenchmark.worst_case thrpt 5 204978.953 ± 232.510 ops/s
-------------
PR: https://git.openjdk.org/jdk/pull/12509