On Tue, 14 Feb 2023 03:20:14 GMT, Sergey Kuksenko <skukse...@openjdk.org> wrote:

>> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
>> poor performance of BigDecimal.divide under certain circumstance.
>> 
>> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
>> According to the flame-graph below, it is StripZeros that spends most of the 
>> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
>> zeros.
>> ![图片 
>> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
>> 
>> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
>> the target value is parsed from back to front, each time stripping out 
>> single ‘0’. To optimize, we can adopt the method of binary search. That is, 
>> each time we try to strip out ${scale/2} ‘0’s. 
>> 
>> The performance looks good. Therotically, time complexity of our method is 
>> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
>> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
>> Jtreg and JCK to check correctness, and it seems fine.
>> 
>> More about environment: 
>> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
>> impact on BigDecimal. Spark cluster consists of a main node and 2 core 
>> nodes, each has 4cores, 16g memory and 4x500GB storage.
>
> The pr looks promising in terms of performance.
> What makes sense to do:
> 
> *)  Don't rely on external benchmarks. It's fine if such exists, but anyway 
> set of microbenchmarks (using JMH) will be much better. More clear, readable 
> results, etc. E.g., it may show that other operations (for example, sqrt) 
> were speeded up too.
> 
> *) Find boundaries. 
> "divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero reminder. 
> Find conditions when it happens. How big is performance regression in such 
> cases?
> 
> Some other optimizations:
> *)
> Current code checks only the lowest bit (odd or even) to cut off indivisible 
> cases.
> Making "divideAndRemainder(bigTenToThe(scaleStep))"  - you make check 
> scaleStep lowest bits to do cut off. See "BigInteger.getLowestSetBit()"
> 
> *)
> BigInteger division by int value is faster. It's a special case. What is 
> faster, doing the single division by bigTenToThe(scaleStep) or doing several 
> divisions (split scaleStep to fit into int)? Exploration is required.

@kuksenko Hi, I have updated the performance of this pr, I wonder if that's ok

-------------

PR: https://git.openjdk.org/jdk/pull/12509

Reply via email to