On Fri, 17 Feb 2023 04:36:08 GMT, Xiaowei Lu <d...@openjdk.org> wrote:

> The pr looks promising in terms of performance. What makes sense to do:
> 
> *) Don't rely on external benchmarks. It's fine if such exists, but anyway 
> set of microbenchmarks (using JMH) will be much better. More clear, readable 
> results, etc. E.g., it may show that other operations (for example, sqrt) 
> were speeded up too.
> 
> *) Find boundaries. "divideAndRemainder(bigTenToThe(scaleStep))" may produce 
> non-zero reminder. Find conditions when it happens. How big is performance 
> regression in such cases?
> 
> Some other optimizations: *) Current code checks only the lowest bit (odd or 
> even) to cut off indivisible cases. Making 
> "divideAndRemainder(bigTenToThe(scaleStep))" - you make check scaleStep 
> lowest bits to do cut off. See "BigInteger.getLowestSetBit()"
> 
> *) BigInteger division by int value is faster. It's a special case. What is 
> faster, doing the single division by bigTenToThe(scaleStep) or doing several 
> divisions (split scaleStep to fit into int)? Exploration is required.

About the boundaries you have mentioned.
In short, such perf regression cases are rare, and our optimization shows 5% 
regression in that case.

As you said, "divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero 
remainder. To make it happen, we need a big scaleStep with a small number of 
trailing zeros of intVal. Since scaleStep=scale-preferredScale, and 
preferredScale = dividend.scale() - divisor.scale(), we intend for a small 
scale for dividend and a big scale for divisor. Meanwhile, the division should 
produce zero remainder.
We managed to find one case: 1/2^n. The remainder is always zero and scaleStep 
is Big enough.

Take the following for example

MathContext mc = new MathContext(36, RoundingMode.HALF_UP);
BigDecimal divisor = BigDecimal.valueOf((long)1<<50);
BigDecimal.ONE.divide(divisor, mc);

In zero stripping, intVal is ``888178419700125232338905334472656250``, scale is 
``51`` and preferredScale is ``0``.
Without our optimization, only one division is needed since intVal has just one 
trailing zero.
With out optimization, scaleStep will firstly try 25, then fail, then try 12 
and fail, then try 6 and fail……Just like the worst case of binary search.

We extend this case to 1/2^n(n in [50, 60)), and we see about 5% regression.

@Benchmark
public void worst_case() {
    // precision of MathContext is carefully designed to make quotient have few 
trailing zeros.(Up to 4)
    MathContext mc1 = new MathContext(38, RoundingMode.HALF_UP);
    for(long i = 50; i < 55; i++) {
        BigDecimal divisor1 = BigDecimal.valueOf((long)1<<i);
        BigDecimal.ONE.divide(divisor1, mc1);
    }
    MathContext mc2 = new MathContext(42, RoundingMode.HALF_UP);
    for(long j = 55; j < 60; j++) {
        BigDecimal divisor2 = BigDecimal.valueOf((long)1<<j);
        BigDecimal.ONE.divide(divisor2, mc2);
    }
}

Before Optimization 
Benchmark                      Mode  Cnt       Score      Error  Units
DecimalBenchmark.worst_case  thrpt    5  212396.544 ± 4124.663  ops/s

After Optimization
Benchmark                      Mode  Cnt       Score     Error  Units
DecimalBenchmark.worst_case  thrpt    5  204978.953 ± 232.510  ops/s

-------------

PR: https://git.openjdk.org/jdk/pull/12509

Reply via email to