Re: RFR: 8294960: Convert java.base/java.lang.invoke package to use the Classfile API to generate lambdas and method handles [v6]

2024-04-17 Thread Sergey Kuksenko
On Wed, 17 Apr 2024 15:21:32 GMT, Adam Sotona  wrote:

>> java.base java.lang.invoke package heavily uses ASM to generate lambdas and 
>> method handles.
>> 
>> This patch converts ASM calls to Classfile API.
>> 
>> This PR is continuation of https://github.com/openjdk/jdk/pull/12945
>> 
>> Any comments and suggestions are welcome.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 12 additional 
> commits since the last revision:
> 
>  - applied suggested changes
>  - Merge branch 'master' into JDK-8294960-invoke
>  - Reversion of ClassBuilder changes
>  - Merge branch 'master' into JDK-8294960-invoke
>  - Apply suggestions from code review
>
>Co-authored-by: liach <7806504+li...@users.noreply.github.com>
>  - Apply suggestions from code review
>
>Co-authored-by: liach <7806504+li...@users.noreply.github.com>
>  - added missing comment
>  - fixed InnerClassLambdaMetafactory for hidden caller classes
>  - performance improvements
>  - consolidation of descriptors handling
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/983bd2da...304054b9

Sorry, but I am afraid we don't have benchmarks targeting lambdas startup.


From: Mandy Chung ***@***.***>
Sent: Wednesday, April 17, 2024 9:32 AM
To: openjdk/jdk
Cc: Sergey Kuksenko; Mention
Subject: [External] : Re: [openjdk/jdk] 8294960: Convert 
java.base/java.lang.invoke package to use the Classfile API to generate lambdas 
and method handles (PR #17108)

@cl4es<https://urldefense.com/v3/__https://github.com/cl4es__;!!ACWV5N9M2RV99hQ!JX5yZ-c9NW3SFSXgKWYc1Kxk2V17ToAhlQI5UDK7A2MWjHAXtLYugARjgiqw_ohDi3CY7GPh1B2YJuwCbeQ-huf92_-fGQ$>
 and 
@kuksenko<https://urldefense.com/v3/__https://github.com/kuksenko__;!!ACWV5N9M2RV99hQ!JX5yZ-c9NW3SFSXgKWYc1Kxk2V17ToAhlQI5UDK7A2MWjHAXtLYugARjgiqw_ohDi3CY7GPh1B2YJuwCbeQ-hudN3iK0Ug$>
 can you advice what performance benchmarks to run to approve this change?

—
Reply to this email directly, view it on 
GitHub<https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/17108*issuecomment-2061720876__;Iw!!ACWV5N9M2RV99hQ!JX5yZ-c9NW3SFSXgKWYc1Kxk2V17ToAhlQI5UDK7A2MWjHAXtLYugARjgiqw_ohDi3CY7GPh1B2YJuwCbeQ-hudSbyeARA$>,
 or 
unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA72M642PZCTBSXMJQ57543Y52P2NAVCNFSM6ABAUY4GQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRG4ZDAOBXGY__;!!ACWV5N9M2RV99hQ!JX5yZ-c9NW3SFSXgKWYc1Kxk2V17ToAhlQI5UDK7A2MWjHAXtLYugARjgiqw_ohDi3CY7GPh1B2YJuwCbeQ-hueMj9Y5wA$>.
You are receiving this because you were mentioned.Message ID: ***@***.***>

-

PR Comment: https://git.openjdk.org/jdk/pull/17108#issuecomment-2062338565


Re: RFR: 8311178: JMH tests don't scale well when sharing output buffers

2023-07-10 Thread Sergey Kuksenko
On Mon, 10 Jul 2023 08:17:59 GMT, Hamlin Li  wrote:

>> @Hamlin-Li 
>> The PR is fully correct.
>> Don't forget, every Java instance method has a specific argument which 
>> called "this".  That is why @State annotation is working.
>
> @kuksenko @swati-sha Thanks for explanation. I can understand what you said.
> But I'm still not quite sure, as I remember jmh does some code manipulation 
> or instrumentation at source code (or bytecode level?), so the jmh test code 
> you write or see might not be the exact code to be executed at runtime.
> It's better to be reviewed further by some one more familiar with jmh, or 
> could you add some data comparing the performance difference, so we can tell 
> it easily?

@Hamlin-Li 
I am one of JMH's authors. I know how it works. There is no need for tests.

-

PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1629033698


Re: RFR: 8311178: JMH tests don't scale well when sharing output buffers

2023-07-07 Thread Sergey Kuksenko
On Fri, 7 Jul 2023 08:29:06 GMT, Hamlin Li  wrote:

>> The below benchmark files have scaling issues due to cache contention and 
>> leads to poor scaling when run on multiple threads. The patch sets the scope 
>> from benchmark level to thread level to fix the issue:
>> - org/openjdk/bench/java/io/DataOutputStreamTest.java
>> - org/openjdk/bench/java/lang/ArrayCopyObject.java
>> - org/openjdk/bench/java/lang/ArrayFiddle.java
>> - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java
>> - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java
>> - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java
>> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java
>> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java
>> - org/openjdk/bench/vm/compiler/ArrayFill.java
>> - org/openjdk/bench/vm/compiler/IndexVector.java
>> 
>> Also removing the static scope for variables in 
>> org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for 
>> better scaling.
>> 
>> Please review and share your feedback.
>> 
>> Thanks,
>> Swati
>
> Hi,
> I'm not sure if I understand this improvement correctly.
> I'm not quite familiar with JMH and it's annotations, but seems to me, the 
> change from `@State(Scope.Benchmark)` to `@State(Scope.Thread)` should not 
> improve the performance by reducing cache contention, as in the jmh doc it 
> says "State objects are usually injected into Benchmark methods as 
> ***arguments***, and JMH takes care of their instantiation and sharing.", 
> this seems mean that @State only matters when the annotated class is used as 
> a parameter of a @Benchmark method, but in the tests you modifed, seems there 
> is no such use case.
> Please also check the sample usages at 
> https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java.

@Hamlin-Li 
The PR is fully correct.
Don't forget, every Java instance method has a specific argument which called 
"this".  That is why @State annotation is working.

-

PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1625627033


Re: RFR: 8302204: Optimize BigDecimal.divide

2023-02-13 Thread Sergey Kuksenko
On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu  wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
> poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
> According to the flame-graph below, it is StripZeros that spends most of the 
> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
> zeros.
> ![图片 
> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
> the target value is parsed from back to front, each time stripping out single 
> ‘0’. To optimize, we can adopt the method of binary search. That is, each 
> time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is 
> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
> Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
> impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, 
> each has 4cores, 16g memory and 4x500GB storage.

The pr looks promising in terms of performance.
What makes sense to do:

*)  Don't rely on external benchmarks. It's fine if such exists, but anyway set 
of microbenchmarks (using JMH) will be much better. More clear, readable 
results, etc. E.g., it may show that other operations (for example, sqrt) were 
speeded up too.

*) Find boundaries. 
"divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero reminder. 
Find conditions when it happens. How big is performance regression in such 
cases?

Some other optimizations:
*)
Current code checks only the lowest bit (odd or even) to cut off indivisible 
cases.
Making "divideAndRemainder(bigTenToThe(scaleStep))"  - you make check scaleStep 
lowest bits to do cut off. See "BigInteger.getLowestSetBit()"

*)
BigInteger division by int value is faster. It's a special case. What is 
faster, doing the single division by bigTenToThe(scaleStep) or doing several 
divisions (split scaleStep to fit into int)? Exploration is required.

-

PR: https://git.openjdk.org/jdk/pull/12509


Re: RFR: 8302204: Optimize BigDecimal.divide

2023-02-12 Thread Sergey Kuksenko
On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu  wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
> poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
> According to the flame-graph below, it is StripZeros that spends most of the 
> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
> zeros.
> ![图片 
> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
> the target value is parsed from back to front, each time stripping out single 
> ‘0’. To optimize, we can adopt the method of binary search. That is, each 
> time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is 
> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
> Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
> impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, 
> each has 4cores, 16g memory and 4x500GB storage.

As for TPC-DS
[AUTO-RESULT] QueryTotal=1968s  vs  [AUTO-RESULT] QueryTotal=1934s
that gives ~1.7% of performance difference. 
Are you sure that this small diff is a real diff, but not run-to-run variance?

-

PR: https://git.openjdk.org/jdk/pull/12509


Re: RFR: 8302204: Optimize BigDecimal.divide

2023-02-11 Thread Sergey Kuksenko
On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu  wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
> poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
> According to the flame-graph below, it is StripZeros that spends most of the 
> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
> zeros.
> ![图片 
> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
> the target value is parsed from back to front, each time stripping out single 
> ‘0’. To optimize, we can adopt the method of binary search. That is, each 
> time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is 
> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
> Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
> impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, 
> each has 4cores, 16g memory and 4x500GB storage.

"The performance looks good." - Could you support this statement with some 
benchmark results?
Thank you.

-

PR: https://git.openjdk.org/jdk/pull/12509


Re: RFR: 8300487: Store cardinality as a field in BitSet

2023-01-18 Thread Sergey Kuksenko
On Tue, 3 Jan 2023 23:25:39 GMT, fabioromano1  wrote:

> The enanchment is useful for applications that make heavy use of BitSet 
> objects as sets of integers, and therefore they need to make a lot of calls 
> to cardinality() method, which actually require linear time in the number of 
> words in use by the bit set.
> This optimization reduces the cost of calling cardinality() to constant time, 
> as it simply returns the value of the field, and it also try to make as 
> little effort as possible to update the field, when needed.
> 
> Moreover, it has been implemented a new method for testing wheter a bit set 
> includes another bit set (i.e., the set of true bits of the parameter is a 
> subset of the true bits of the instance).

It would be better to see benchmark results other than cardinality operation.
To understand how big performance regression is.

-

PR: https://git.openjdk.org/jdk/pull/11837