Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-09-20 Thread Andrew Haley
On Mon, 20 Sep 2021 09:52:45 GMT, Andrew Dinn  wrote:

>> Andrew, can you help us to approve this?
>
> I agree with Andrew Haley that this patch is not going to make an improvement 
> for anything but a very small number of applications. Processing of strings 
> over a few 10s of bytes is rare. On the other hand the doesn't seem to cause 
> any performance drop for the much more common case of processing short 
> strings. so it does no harm. Also, the new and old code are much the same in 
> terms of complexity so that is no reason to prefer one over the other. The 
> only real concern I have is that any change involves the risk of error and 
> the ratio of cases that might benefit to cases that might suffer from an 
> error is very low. I don't think that's a reason to avoid pushing this patch 
> upstream but it does suggest that we should not backport it.

OK, thanks. That seems like a sensible compromise.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7]

2021-09-20 Thread Andrew Haley
On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix windows build failed

Marked as reviewed by aph (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-09-20 Thread Andrew Dinn
On Fri, 17 Sep 2021 07:13:24 GMT, Wu Yan  wrote:

>> It's fine. I don't think it'll affect any real programs, so it's rather 
>> pointless. I don't know if that's any reason not to approve it.
>
> Andrew, can you help us to approve this?

I agree with Andrew Haley that this patch is not going to make an improvement 
for anything but a very small number of applications. Processing of strings 
over a few 10s of bytes is rare. On the other hand the doesn't seem to cause 
any performance drop for the much more common case of processing short strings. 
so it does no harm. Also, the new and old code are much the same in terms of 
complexity so that is no reason to prefer one over the other. The only real 
concern I have is that any change involves the risk of error and the ratio of 
cases that might benefit to cases that might suffer from an error is very low. 
I don't think that's a reason to avoid pushing this patch upstream but it does 
suggest that we should not backport it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-09-17 Thread Wu Yan
On Sun, 5 Sep 2021 13:23:21 GMT, Andrew Haley  wrote:

>> Thanks, I'll fix it.
>
> It's fine. I don't think it'll affect any real programs, so it's rather 
> pointless. I don't know if that's any reason not to approve it.

Andrew, can you help us to approve this?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7]

2021-09-08 Thread Nick Gasson
On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix windows build failed

Marked as reviewed by ngasson (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7]

2021-09-08 Thread Wu Yan
On Tue, 7 Sep 2021 01:38:02 GMT, Nick Gasson  wrote:

> Please check the Windows tier1 failure: 
> https://github.com/Wanghuang-Huawei/jdk/runs/3459332995
> 
> Seems unlikely that it's anything to do with this patch so you may just want 
> to re-run it or merge from master.

OK, The rerun of presubmit test show that it passed all tests. The result is 
here: https://github.com/Wanghuang-Huawei/jdk/actions/runs/1181122290

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7]

2021-09-06 Thread Nick Gasson
On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix windows build failed

Please check the Windows tier1 failure: 
https://github.com/Wanghuang-Huawei/jdk/runs/3459332995

Seems unlikely that it's anything to do with this patch so you may just want to 
re-run it or merge from master.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-09-05 Thread Andrew Haley
On Thu, 26 Aug 2021 09:26:24 GMT, Wu Yan  wrote:

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871:
>> 
>>> 4869: // exit from large loop when less than 64 bytes left to read or 
>>> we're about
>>> 4870: // to prefetch memory behind array border
>>> 4871: int largeLoopExitCondition = MAX(64, 
>>> SoftwarePrefetchHintDistance)/(isLL ? 1 : 2);
>> 
>> This breaks the Windows AArch64 build:
>> 
>> 
>> Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s)
>> d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): 
>> error C3861: 'MAX': identifier not found
>> make[3]: *** [lib/CompileJvm.gmk:143: 
>> /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm
>> 
>> 
>> https://github.com/Wanghuang-Huawei/jdk/runs/3260986937
>> 
>> Should probably be left as `MAX2`.
>
> Thanks, I'll fix it.

It's fine. I don't think it'll affect any real programs, so it's rather 
pointless. I don't know if that's any reason not to approve it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-09-03 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
>
>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
> 
> I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
> Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
> streaming workloads, which will affect this. (That's just an example: I'm 
> making the point that implementations differ.)
> 
> The trouble with this patch is that it (probably) makes things better for 
> long strings, which are very rare. What we actually need to care about is 
> performance for a large number of typical-sized strings, which are names, 
> identifiers, passwords, and so on: about 10-30 characters.

@theRealAph do you have any other questions about this patch?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7]

2021-08-29 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  fix windows build failed

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/2f756261..8cf3b2c7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=05-06

  Stats: 97 lines in 2 files changed: 0 ins; 96 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-08-26 Thread Wu Yan
On Wed, 25 Aug 2021 07:40:56 GMT, Nick Gasson  wrote:

> I've run the benchmark on several different machines and didn't see any 
> performance regressions, and the speed-up for longer strings looks quite 
> good. I also ran jtreg tier1-3 with no new failures so I think this is ok.
> 
> If you fix the Windows build I'll approve it. But please wait for another 
> review, preferably from @theRealAph.

OK, Thank you very much!


> Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this 
> intrinsic: it would be good if we could merge them, either now or later.

The JMH benchmark added by JDK-8269559 (#5129) can cover our test items 
(compareToLL and compareToUU), and can show the improvement of our patch, so we 
decided to delete our JMH benchmark in the next commit.
The test results using that JMH benchmark in JDK-8269559 are as follows:

Raspberry Pi 4B
base:
Benchmark   (delta)  (size)  Mode  Cnt   Score  
 Error  Units
StringCompareToDifferentLength.compareToLL2  24  avgt3   2.310 
? 0.050  ms/op
StringCompareToDifferentLength.compareToLL2  36  avgt3   2.818 
? 0.185  ms/op
StringCompareToDifferentLength.compareToLL2  72  avgt3   3.151 
? 0.215  ms/op
StringCompareToDifferentLength.compareToLL2 128  avgt3   4.171 
? 1.320  ms/op
StringCompareToDifferentLength.compareToLL2 256  avgt3   6.169 
? 0.653  ms/op
StringCompareToDifferentLength.compareToLL2 512  avgt3  10.911 
? 0.175  ms/op
StringCompareToDifferentLength.compareToLU2  24  avgt3   3.312 
? 0.102  ms/op
StringCompareToDifferentLength.compareToLU2  36  avgt3   4.162 
? 0.032  ms/op
StringCompareToDifferentLength.compareToLU2  72  avgt3   5.705 
? 0.152  ms/op
StringCompareToDifferentLength.compareToLU2 128  avgt3   9.301 
? 0.749  ms/op
StringCompareToDifferentLength.compareToLU2 256  avgt3  16.507 
? 1.353  ms/op
StringCompareToDifferentLength.compareToLU2 512  avgt3  30.160 
? 0.377  ms/op
StringCompareToDifferentLength.compareToUL2  24  avgt3   3.366 
? 0.280  ms/op
StringCompareToDifferentLength.compareToUL2  36  avgt3   4.308 
? 0.037  ms/op
StringCompareToDifferentLength.compareToUL2  72  avgt3   5.674 
? 0.210  ms/op
StringCompareToDifferentLength.compareToUL2 128  avgt3   9.358 
? 0.158  ms/op
StringCompareToDifferentLength.compareToUL2 256  avgt3  16.165 
? 0.158  ms/op
StringCompareToDifferentLength.compareToUL2 512  avgt3  29.857 
? 0.277  ms/op
StringCompareToDifferentLength.compareToUU2  24  avgt3   3.149 
? 0.209  ms/op
StringCompareToDifferentLength.compareToUU2  36  avgt3   3.157 
? 0.102  ms/op
StringCompareToDifferentLength.compareToUU2  72  avgt3   4.415 
? 0.073  ms/op
StringCompareToDifferentLength.compareToUU2 128  avgt3   6.244 
? 0.224  ms/op
StringCompareToDifferentLength.compareToUU2 256  avgt3  11.032 
? 0.080  ms/op
StringCompareToDifferentLength.compareToUU2 512  avgt3  20.942 
? 3.973  ms/op

opt:
Benchmark   (delta)  (size)  Mode  Cnt   Score  
 Error  Units
StringCompareToDifferentLength.compareToLL2  24  avgt3   2.319 
? 0.121  ms/op
StringCompareToDifferentLength.compareToLL2  36  avgt3   2.820 
? 0.096  ms/op
StringCompareToDifferentLength.compareToLL2  72  avgt3   2.511 
? 0.024  ms/op
StringCompareToDifferentLength.compareToLL2 128  avgt3   3.496 
? 0.382  ms/op
StringCompareToDifferentLength.compareToLL2 256  avgt3   5.215 
? 0.210  ms/op
StringCompareToDifferentLength.compareToLL2 512  avgt3   7.772 
? 0.448  ms/op
StringCompareToDifferentLength.compareToLU2  24  avgt3   3.432 
? 0.249  ms/op
StringCompareToDifferentLength.compareToLU2  36  avgt3   4.156 
? 0.052  ms/op
StringCompareToDifferentLength.compareToLU2  72  avgt3   5.735 
? 0.043  ms/op
StringCompareToDifferentLength.compareToLU2 128  avgt3   9.215 
? 0.394  ms/op
StringCompareToDifferentLength.compareToLU2 256  avgt3  16.373 
? 0.515  ms/op
StringCompareToDifferentLength.compareToLU2 512  avgt3  29.906 
? 0.245  ms/op
StringCompareToDifferentLength.compareToUL2  24  avgt3   3.361 
? 0.116  ms/op
StringCompareToDifferentLength.compareToUL2  36  avgt3   4.253 
? 0.061  ms/op
StringCompareToDifferentLength.compareToUL2  72  avgt3   5.744 
? 0.082  ms/op
StringCompareToDifferentLength.compareToUL2 128  avgt3   9.167 
? 0.343  ms/op
StringCompareToDifferentLength.compareToUL2 256  avgt3  16.591 
? 0.999  ms/op
StringCompareToDifferen

Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-08-25 Thread Nick Gasson
On Fri, 6 Aug 2021 09:50:54 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix codestyle

I've run the benchmark on several different machines and didn't see any 
performance regressions, and the speed-up for longer strings looks quite good. 
I also ran jtreg tier1-3 with no new failures so I think this is ok.

If you fix the Windows build I'll approve it. But please wait for another 
review, preferably from @theRealAph.

Note that JDK-8269559 (#5129) is also adding a JMH benchmark for this 
intrinsic: it would be good if we could merge them, either now or later.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871:

> 4869: // exit from large loop when less than 64 bytes left to read or 
> we're about
> 4870: // to prefetch memory behind array border
> 4871: int largeLoopExitCondition = MAX(64, 
> SoftwarePrefetchHintDistance)/(isLL ? 1 : 2);

This breaks the Windows AArch64 build:


Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s)
d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): error 
C3861: 'MAX': identifier not found
make[3]: *** [lib/CompileJvm.gmk:143: 
/cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm


https://github.com/Wanghuang-Huawei/jdk/runs/3260986937

Should probably be left as `MAX2`.

-

Changes requested by ngasson (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-08-23 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
>
>> I don't think we want to keep two copies of the compareTo intrinsic. If 
>> there are no cases where the LDP version is worse than the original version 
>> then we should just delete the old one and replace it with this.
> 
> I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
> Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
> streaming workloads, which will affect this. (That's just an example: I'm 
> making the point that implementations differ.)
> 
> The trouble with this patch is that it (probably) makes things better for 
> long strings, which are very rare. What we actually need to care about is 
> performance for a large number of typical-sized strings, which are names, 
> identifiers, passwords, and so on: about 10-30 characters.

Hi, @theRealAph @nick-arm, The test data looks OK on Raspberry Pi 4B and 
Hisilicon, do you have any other questions about this patch?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6]

2021-08-06 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  fix codestyle

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/60dd0516..2f756261

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=04-05

  Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v5]

2021-08-04 Thread Wu Yan
On Tue, 3 Aug 2021 13:33:07 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request with a new target base due to a merge 
> or a rebase. The incremental webrev excludes the unrelated changes brought in 
> by the merge/rebase. The pull request contains six additional commits since 
> the last revision:
> 
>  - fix bugs
>  - Merge branch 'master' of https://gitee.com/ustc-wh/jdk into JDK-8268231
>  - fix style and add unalign test case
>  - refact codes
>  - draft of refactor
>  - 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

We also tested this version on Hisilicon, increasing the count of each test and 
the length of the string because the test data fluctuates more. When diff_pos 
is greater than 255, the improvement will be more obvious. And in all cases 
there was no significant decline.


base:
Benchmark (diff_pos)  (size)  Mode  Cnt   
Score   Error  Units
StringCompare.compareLLDiffStrings 7 512  avgt   50   
5.481 ? 1.230  us/op
StringCompare.compareLLDiffStrings31 512  avgt   50   
6.944 ? 0.962  us/op
StringCompare.compareLLDiffStrings63 512  avgt   50  
10.129 ? 0.973  us/op
StringCompare.compareLLDiffStrings   127 512  avgt   50  
15.944 ? 0.786  us/op
StringCompare.compareLLDiffStrings   255 512  avgt   50  
28.233 ? 0.737  us/op
StringCompare.compareLLDiffStrings   511 512  avgt   50  
51.612 ? 1.357  us/op
StringCompare.compareUUDiffStrings 7 512  avgt   50   
5.552 ? 0.809  us/op
StringCompare.compareUUDiffStrings31 512  avgt   50  
12.024 ? 1.499  us/op
StringCompare.compareUUDiffStrings63 512  avgt   50  
15.368 ? 0.009  us/op
StringCompare.compareUUDiffStrings   127 512  avgt   50  
28.354 ? 0.655  us/op
StringCompare.compareUUDiffStrings   255 512  avgt   50  
52.9

Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v5]

2021-08-04 Thread Wu Yan
On Wed, 4 Aug 2021 03:27:49 GMT, Nick Gasson  wrote:

>> Wang Huang has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains six additional 
>> commits since the last revision:
>> 
>>  - fix bugs
>>  - Merge branch 'master' of https://gitee.com/ustc-wh/jdk into JDK-8268231
>>  - fix style and add unalign test case
>>  - refact codes
>>  - draft of refactor
>>  - 8268231: Aarch64: Use ldp in intrinsics for String.compareTo
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4892:
> 
>> 4890:   __ cmp(tmp1, tmp2);
>> 4891:   __ ccmp(tmp1h, tmp2h, 0, Assembler::EQ);
>> 4892:   __ br(__ NE, DIFF);
> 
> The line above uses `Assembler::EQ` for the condition code but this line uses 
> `__ NE`. Better to be consistent and use `Assembler::` everywhere.

Thanks, I'll fix it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v5]

2021-08-04 Thread Wu Yan
On Wed, 4 Aug 2021 03:29:40 GMT, Nick Gasson  wrote:

> Please provide the updated benchmark results for this version. Are you able 
> to test it on several different machines?

We tested this version on Raspberry Pi 4B.

base:
Benchmark (diff_pos)  (size)  Mode  Cnt
ScoreError  Units
StringCompare.compareLLDiffStrings 7 256  avgt5   
14.882 ?  0.157  us/op
StringCompare.compareLLDiffStrings15 256  avgt5   
15.514 ?  0.094  us/op
StringCompare.compareLLDiffStrings31 256  avgt5   
16.756 ?  0.050  us/op
StringCompare.compareLLDiffStrings47 256  avgt5   
18.196 ?  0.727  us/op
StringCompare.compareLLDiffStrings63 256  avgt5   
20.110 ?  0.075  us/op
StringCompare.compareLLDiffStrings   127 256  avgt5   
31.458 ?  0.032  us/op
StringCompare.compareLLDiffStrings   255 256  avgt5   
53.099 ?  1.212  us/op
StringCompare.compareUUDiffStrings 7 256  avgt5   
15.419 ?  0.012  us/op
StringCompare.compareUUDiffStrings15 256  avgt5   
16.761 ?  0.078  us/op
StringCompare.compareUUDiffStrings31 256  avgt5   
20.132 ?  0.112  us/op
StringCompare.compareUUDiffStrings47 256  avgt5   
27.492 ?  0.104  us/op
StringCompare.compareUUDiffStrings63 256  avgt5   
32.147 ?  0.028  us/op
StringCompare.compareUUDiffStrings   127 256  avgt5   
56.208 ?  0.016  us/op
StringCompare.compareUUDiffStrings   255 256  avgt5  
100.439 ?  0.782  us/op
StringCompare.compareUUDiffStringsTurnOffCCP   7 256  avgt5   
15.441 ?  0.071  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  15 256  avgt5   
16.781 ?  0.192  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  31 256  avgt5   
20.109 ?  0.010  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  47 256  avgt5   
27.463 ?  0.068  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  63 256  avgt5   
32.168 ?  0.064  us/op
StringCompare.compareUUDiffStringsTurnOffCCP 127 256  avgt5   
56.283 ?  0.551  us/op
StringCompare.compareUUDiffStringsTurnOffCCP 255 256  avgt5  
100.419 ?  0.914  us/op

opt:
Benchmark (diff_pos)  (size)  Mode  Cnt
Score   Error  Units
StringCompare.compareLLDiffStrings 7 256  avgt5   
14.064 ? 0.048  us/op
StringCompare.compareLLDiffStrings15 256  avgt5   
16.079 ? 0.041  us/op
StringCompare.compareLLDiffStrings31 256  avgt5   
17.413 ? 0.033  us/op
StringCompare.compareLLDiffStrings47 256  avgt5   
18.750 ? 0.012  us/op
StringCompare.compareLLDiffStrings63 256  avgt5   
20.093 ? 0.052  us/op
StringCompare.compareLLDiffStrings   127 256  avgt5   
27.432 ? 0.009  us/op
StringCompare.compareLLDiffStrings   255 256  avgt5   
44.832 ? 0.173  us/op
StringCompare.compareUUDiffStrings 7 256  avgt5   
16.071 ? 0.028  us/op
StringCompare.compareUUDiffStrings15 256  avgt5   
18.082 ? 0.015  us/op
StringCompare.compareUUDiffStrings31 256  avgt5   
20.753 ? 0.006  us/op
StringCompare.compareUUDiffStrings47 256  avgt5   
25.427 ? 0.051  us/op
StringCompare.compareUUDiffStrings63 256  avgt5   
28.170 ? 0.091  us/op
StringCompare.compareUUDiffStrings   127 256  avgt5   
42.809 ? 0.143  us/op
StringCompare.compareUUDiffStrings   255 256  avgt5   
75.056 ? 0.741  us/op
StringCompare.compareUUDiffStringsTurnOffCCP   7 256  avgt5   
16.132 ? 0.195  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  15 256  avgt5   
17.423 ? 0.023  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  31 256  avgt5   
20.102 ? 0.112  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  47 256  avgt5   
25.529 ? 0.367  us/op
StringCompare.compareUUDiffStringsTurnOffCCP  63 256  avgt5   
26.804 ? 0.051  us/op
StringCompare.compareUUDiffStringsTurnOffCCP 127 256  avgt5   
40.988 ? 0.425  us/op
StringCompare.compareUUDiffStringsTurnOffCCP 255 256  avgt5   
77.157 ? 0.187  us/op


On the Raspberry Pi, the improvement is more obvious when the diff_pos is above 
127.



> I meant the earlier String.compareTo that this is partially replacing. This 
> one might be fine but I just wanted to check it had be thoroughly tested. For 
> reference they were:
> 
> https://bugs.openjdk

Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v5]

2021-08-03 Thread Nick Gasson
On Tue, 3 Aug 2021 13:33:07 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request with a new target base due to a merge 
> or a rebase. The incremental webrev excludes the unrelated changes brought in 
> by the merge/rebase. The pull request contains six additional commits since 
> the last revision:
> 
>  - fix bugs
>  - Merge branch 'master' of https://gitee.com/ustc-wh/jdk into JDK-8268231
>  - fix style and add unalign test case
>  - refact codes
>  - draft of refactor
>  - 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

Please provide the updated benchmark results for this version. Are you able to 
test it on several different machines?

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4892:

> 4890:   __ cmp(tmp1, tmp2);
> 4891:   __ ccmp(tmp1h, tmp2h, 0, Assembler::EQ);
> 4892:   __ br(__ NE, DIFF);

The line above uses `Assembler::EQ` for the condition code but this line uses 
`__ NE`. Better to be consistent and use `Assembler::` everywhere.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-08-03 Thread Wang Huang
On Thu, 15 Jul 2021 03:30:46 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix style and add unalign test case

Thank you for your suggestion. I have pushed new commit.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v5]

2021-08-03 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains six additional commits since the 
last revision:

 - fix bugs
 - Merge branch 'master' of https://gitee.com/ustc-wh/jdk into JDK-8268231
 - fix style and add unalign test case
 - refact codes
 - draft of refactor
 - 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/c85cd126..60dd0516

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=03-04

  Stats: 9839 lines in 397 files changed: 5783 ins; 2526 del; 1530 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

2021-07-30 Thread Nick Gasson
On Fri, 30 Jul 2021 10:36:01 GMT, Andrew Haley  wrote:

> 
> I was (still am) tempted to approve it, but Nick says there are still bugs in 
> corner cases.
> 

I meant the earlier String.compareTo that this is partially replacing. This one 
might be fine but I just wanted to check it had be thoroughly tested. For 
reference they were:

https://bugs.openjdk.java.net/browse/JDK-8215100
https://bugs.openjdk.java.net/browse/JDK-8237524
https://bugs.openjdk.java.net/browse/JDK-8218966

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-30 Thread Andrew Haley
On 7/30/21 7:49 AM, Wu Yan wrote:

> I aggree. This is the compromise solution that the optimization
> has no effect (or even slowdown) on some platforms.
> In addition, I found that in
> [JDK-8202326](https://bugs.openjdk.java.net/browse/JDK-8202326),
> adding prefetches is only for long strings (the rare cases),
> maybe we can further optimize longs string with LDP. So should
> I continue this optimization or close it.

IMO, we don't want to be using the vector unit unless it does
some good, and if you can do this sort of thing in the CPU core
you should, so I like that. I was (still am) tempted to approve
it, but Nick says there are still bugs in corner cases.

I think you should probably close it. Comparison of really long
Strings is so rare that I can't find any examples of where it
actually happens. Array comparisons, sure, but Strings, not so
much.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-29 Thread Wu Yan
On Wed, 28 Jul 2021 09:55:18 GMT, Nick Gasson  wrote:

> Adding prefetches was one of the reasons to introduce the separate stub for 
> long strings, see the mail below:
> 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/02.html


Thank you for pointing this out, we didn't find that adding prefetches was one 
of the reasons for that optimization before.  

> Did you find there's no benefit to that?

In fact, at first we tested and found that adding prefetch would make it worse 
in some cases, so we removed prefetch in the LDP version, but after more 
testing, we found that prefetch is not the cause of the performance 
degradation. Sorry for this, please ignore the prefetch problem,  I will add 
prefetch back next.


> We don't really want to have different implementations for each 
> microarchitecture, that would be a testing nightmare.

I aggree. This is the compromise solution that the optimization has no effect 
(or even slowdown) on some platforms. 
In addition, I found that in 
[JDK-8202326](https://bugs.openjdk.java.net/browse/JDK-8202326), adding 
prefetches is only for long strings (the rare cases), maybe we can further 
optimize longs string with LDP. So should I continue this optimization or close 
it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Nick Gasson
On Wed, 28 Jul 2021 09:29:25 GMT, Wu Yan  wrote:

> 
> We are testing on HiSilicon TSV110, maybe we can enable this optimization by 
> default on the verified platforms.

We don't really want to have different implementations for each 
microarchitecture, that would be a testing nightmare. 

The existing stub uses prefetch instructions if `SoftwarePrefetchHintDistance 
>= 0` but the new LDP version doesn't. Did you find there's no benefit to that? 
Adding prefetches was one of the reasons to introduce the separate stub for 
long strings, see the mail below:

https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-April/02.html

It seems the existing code was tuned for Thunder X/X2 so perhaps that's why 
Andrew sees little improvement there with the new version.

What testing have you done besides benchmarking? The patch linked above had at 
least two subtle bugs in corner cases.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Wu Yan
On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley  wrote:

> The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, Neoverse 
> N2 errata, 2001293, and I see that LDP has to be slowed down on streaming 
> workloads, which will affect this. (That's just an example: I'm making the 
> point that implementations differ.)

We are testing on HiSilicon TSV110, maybe we can enable this optimization by 
default on the verified platforms.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Andrew Haley
On Wed, 28 Jul 2021 08:25:08 GMT, Nick Gasson  wrote:

> I don't think we want to keep two copies of the compareTo intrinsic. If there 
> are no cases where the LDP version is worse than the original version then we 
> should just delete the old one and replace it with this.

I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, 
Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on 
streaming workloads, which will affect this.

The trouble with this patch is that it (probably) makes things better for long 
strings, which are very rare. What we actually need to care about is 
performance for a large number of typical-sized strings, which are names, 
identifiers, passwords, and so on: about 10-30 characters.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-28 Thread Nick Gasson
On Thu, 15 Jul 2021 03:30:46 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix style and add unalign test case

I don't think we want to keep two copies of the compareTo intrinsic. If there 
are no cases where the LDP version is worse than the original version then we 
should just delete the old one and replace it with this.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]

2021-07-28 Thread Wu Yan
On Mon, 12 Jul 2021 15:36:29 GMT, Andrew Haley  wrote:

>> Wang Huang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   draft of refactor
>
> And with longer strings, M1 and ThunderX2:
> 
> 
> Benchmark   (diff_pos)  (size)  Mode  Cnt 
>   Score   Error  Units
> StringCompare.compareLLDiffStrings10231024  avgt3 
>  50.849 ± 0.087  us/op
> StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3 
>  23.676 ± 0.015  us/op
> StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3 
>  28.967 ± 0.168  us/op
> 
> 
> StringCompare.compareLLDiffStrings10231024  avgt3 
>  98.681 ± 0.026  us/op
> StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3 
>  82.576 ± 0.656  us/op
> StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3 
>  98.801 ± 0.321  us/op
> 
> LDP wins on M1 here, but on ThunderX2 it makes almost no difference at all. 
> And how often are we comparing such long strings?
> I don't know what to think, really. It seems that we're near to a place where 
> we're optimizing for micro-architecture, and I don't want to see that here. 
> On the other hand, using LDP is not worse anywhere, so we should allow it.

Could you do me a favor to review the patch? @theRealAph @nick-arm Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4]

2021-07-14 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  fix style and add unalign test case

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/3fa9afcb..c85cd126

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=02-03

  Stats: 32 lines in 2 files changed: 22 ins; 1 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v3]

2021-07-14 Thread Wang Huang
On Wed, 14 Jul 2021 08:27:36 GMT, Nick Gasson  wrote:

>> Wang Huang has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   refact codes
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4990:
> 
>> 4988:   __ lsrv(tmp2, tmp2, rscratch2);
>> 4989:   if (isLL) {
>> 4990:   __ uxtbw(tmp1, tmp1);
> 
> Convention is to indent with two spaces but have four here.

Thank you for your suggestion. I have fixed the style and add unalign test case 
in my new commit.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v3]

2021-07-14 Thread Nick Gasson
On Wed, 14 Jul 2021 08:47:56 GMT, Nick Gasson  wrote:

> I tried that on N1 and it's very slightly slower than with the 16B alignment

Sorry, ignore that, the result is actually the other way round. Not sure what's 
going on there, but there's no significant difference.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v3]

2021-07-14 Thread Nick Gasson
On Tue, 13 Jul 2021 07:37:31 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   refact codes

Have you tested when the data in the `byte[]` array is not 16B aligned? With 
the default JVM options the array data naturally starts 16B into the object, 
but you can force a different alignment with e.g. 
`-XX:-UseCompressedClassPointers`. I tried that on N1 and it's very slightly 
slower than with the 16B alignment, but still faster than the non-LDP version 
for length 1024 strings. On A72 the difference is a bit bigger but again faster 
than non-LDP.

N1, -UseCompressedClassPointers


Benchmark  (diff_pos)  (size)  Mode  Cnt
Score   Error  Units
StringCompare.compareLLDiffStrings   10231024  avgt5   
67.789 ? 0.095  us/op
StringCompare.compareLLDiffStringsWithLdp10231024  avgt5   
45.912 ? 0.059  us/op
StringCompare.compareUUDiffStrings   10231024  avgt5  
133.365 ? 0.086  us/op
StringCompare.compareUUDiffStringsWithLdp10231024  avgt5   
89.009 ? 0.312  us/op


N1, +UseCompressedClassPointers


Benchmark  (diff_pos)  (size)  Mode  Cnt
Score   Error  Units
StringCompare.compareLLDiffStrings   10231024  avgt5   
67.878 ? 0.156  us/op
StringCompare.compareLLDiffStringsWithLdp10231024  avgt5   
46.487 ? 0.115  us/op
StringCompare.compareUUDiffStrings   10231024  avgt5  
133.576 ? 0.111  us/op
StringCompare.compareUUDiffStringsWithLdp10231024  avgt5   
90.462 ? 0.176  us/op


A72, -UseCompressedClassPointers


Benchmark  (diff_pos)  (size)  Mode  Cnt
Score   Error  Units
StringCompare.compareLLDiffStrings   10231024  avgt5  
122.697 ? 0.235  us/op
StringCompare.compareLLDiffStringsWithLdp10231024  avgt5   
73.883 ? 0.136

Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v3]

2021-07-13 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  refact codes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/2ae667b9..3fa9afcb

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=01-02

  Stats: 167 lines in 3 files changed: 0 ins; 153 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]

2021-07-12 Thread Wang Huang
On Mon, 12 Jul 2021 15:36:29 GMT, Andrew Haley  wrote:

> And with longer strings, M1 and ThunderX2:
> 
> ```
> Benchmark   (diff_pos)  (size)  Mode  Cnt 
>   Score   Error  Units
> StringCompare.compareLLDiffStrings10231024  avgt3 
>  50.849 ± 0.087  us/op
> StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3 
>  23.676 ± 0.015  us/op
> StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3 
>  28.967 ± 0.168  us/op
> ```
> 
> ```
> StringCompare.compareLLDiffStrings10231024  avgt3 
>  98.681 ± 0.026  us/op
> StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3 
>  82.576 ± 0.656  us/op
> StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3 
>  98.801 ± 0.321  us/op
> ```
> 
> LDP wins on M1 here, but on ThunderX2 it makes almost no difference at all. 
> And how often are we comparing such long strings?
> I don't know what to think, really. It seems that we're near to a place where 
> we're optimizing for micro-architecture, and I don't want to see that here. 
> On the other hand, using LDP is not worse anywhere, so we should allow it.

Thank you for your suggestion. I inspect the result and find that the result of 
my first commit (c5e29b9fedae7e1d24056a6fae8aff04afeb3889) is better. Because 
of that , I will choose the version without refacting 
`compare_string_16_bytes_same` as the final version.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]

2021-07-12 Thread Andrew Haley
On Mon, 12 Jul 2021 09:14:25 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   draft of refactor

And with longer strings, M1 and ThunderX2:


Benchmark   (diff_pos)  (size)  Mode  Cnt   
Score   Error  Units
StringCompare.compareLLDiffStrings10231024  avgt3  
50.849 ± 0.087  us/op
StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3  
23.676 ± 0.015  us/op
StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3  
28.967 ± 0.168  us/op


StringCompare.compareLLDiffStrings10231024  avgt3  
98.681 ± 0.026  us/op
StringCompare.compareLLDiffStringsWithLdp 10231024  avgt3  
82.576 ± 0.656  us/op
StringCompare.compareLLDiffStringsWithRefactor10231024  avgt3  
98.801 ± 0.321  us/op

LDP wins on M1 here, but on ThunderX2 it makes almost no difference at all. And 
how often are we comparing such long strings?
I don't know what to think, really. It seems that we're near to a place where 
we're optimizing for micro-architecture, and I don't want to see that here. On 
the other hand, using LDP is not worse anywhere, so we should allow it.

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]

2021-07-12 Thread Andrew Haley
On Mon, 12 Jul 2021 09:14:25 GMT, Wang Huang  wrote:

>> Dear all, 
>> Can you do me a favor to review this patch. This patch use `ldp` to 
>> implement String.compareTo.
>>
>> * We add a JMH test case 
>> * Here is the result of this test case
>>  
>> Benchmark   |(size)| Mode| Cnt|Score | Error  |Units 
>> -|--|-||--||-
>> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±   0.005|us/op
>> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±   0.006|us/op
>> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±   0.011|us/op
>> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±   0.12 |us/op
>> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±   0.007|us/op
>> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±   0.006|us/op
>> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±   0.417|us/op
>> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±   0.041|us/op
>> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001   | ± 
>> 0.121|us/op
>> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±   0.003|us/op
>> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±   0.201|us/op
>> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±   0.004|us/op
>> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±   1.342|us/op
>> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±   0.581|us/op
>> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±   1.775|us/op
>> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±   0.01 |us/op
>> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±   0.006|us/op
>> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±   0.011|us/op
>> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±   0.008|us/op
>> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±   0.017|us/op
>> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±   0.011|us/op
>> StringCompare.compareUU   |  181 | avgt| 5  |39.31   | ± 
>> 0.016|us/op
>> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±   0.392|us/op
>> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±   0.008|us/op
>> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±   0.158|us/op
>> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±   0.024|us/op
>> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±   0.006|us/op
>> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±   0.434|us/op
>> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±   0.016|us/op
>> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±   0.017|us/op
>> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±   3.5  |us/op
>> 
>> From this table, we can see that in most cases, our patch is better than old 
>> one.
>> 
>> Thank you for your review. Any suggestions are welcome.
>
> Wang Huang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   draft of refactor

Two machines to compare here, Apple M1 and ThunderX2:


Benchmark   (diff_pos)  (size)  Mode  Cnt  
Score   Error  Units
StringCompare.compareLLDiffStrings   7 128  avgt3  
2.194 ± 0.001  us/op
StringCompare.compareLLDiffStrings  15 128  avgt3  
2.195 ± 0.018  us/op
StringCompare.compareLLDiffStrings  31 128  avgt3  
2.508 ± 0.003  us/op
StringCompare.compareLLDiffStrings  47 128  avgt3  
2.821 ± 0.001  us/op
StringCompare.compareLLDiffStrings  63 128  avgt3  
3.446 ± 0.003  us/op
StringCompare.compareLLDiffStringsWithLdp7 128  avgt3  
2.194 ± 0.001  us/op
StringCompare.compareLLDiffStringsWithLdp   15 128  avgt3  
2.195 ± 0.001  us/op
StringCompare.compareLLDiffStringsWithLdp   31 128  avgt3  
2.508 ± 0.001  us/op
StringCompare.compareLLDiffStringsWithLdp   47 128  avgt3  
2.510 ± 0.006  us/op
StringCompare.compareLLDiffStringsWithLdp   63 128  avgt3  
2.824 ± 0.003  us/op
StringCompare.compareLLDiffStringsWithRefactor   7 128  avgt3  
1.882 ± 0.018  us/op
StringCompare.compareLLDiffStringsWithRefactor  15 128  avgt3  
2.019 ± 0.002  us/op
StringCompare.compareLLDiffStringsWithRefactor  31 128  avgt3  
2.355 ± 0.003  us/op
StringCompare.compareLLDiffStringsWithRefactor  47 128  avgt3  
2.821 ± 0.010  us/op
StringCompare.compareLLDiffStringsWithRefactor  63 128  avgt3  
3.135 ± 0.002  us/op


Benchmark   (diff_pos)  (size)  Mode  Cnt   
Score   Error  Units
StringCompare.compareLLDiffStrings  

Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v2]

2021-07-12 Thread Wang Huang
> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

Wang Huang has updated the pull request incrementally with one additional 
commit since the last revision:

  draft of refactor

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4722/files
  - new: https://git.openjdk.java.net/jdk/pull/4722/files/c5e29b9f..2ae667b9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=00-01

  Stats: 155 lines in 3 files changed: 153 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

2021-07-12 Thread Wang Huang
On Fri, 9 Jul 2021 09:15:18 GMT, Andrew Haley  wrote:

> I'm quite tempted to approve this. It looks generally better, simpler, and 
> easier to understand than what we have today. However, the improvement isn't 
> great, and I suspect is mostly because of the reduction in traffic between 
> Base and Vector registers.
> What happens if you rewrite `compare_string_16_bytes_same()` to use `ldp` ?

I refacted `compare_string_16_bytes_same()` as a draft, the performance 
comparision is listed here,
Benchmark  |(diff_pos)|(size) | Mode | Cnt 
|  Score|  Error | Units
---|--|---|--|-|---||--
StringCompare.compareLLDiffStrings | 7|   128 | avgt |   5 
|  4.252|± 0.001 | us/op
StringCompare.compareLLDiffStrings |15|   128 | avgt |   5 
|  4.714|± 0.001 | us/op
StringCompare.compareLLDiffStrings |31|   128 | avgt |   5 
|  6.139|± 0.445 | us/op
StringCompare.compareLLDiffStrings |47|   128 | avgt |   5 
| 13.861|± 0.001 | us/op
StringCompare.compareLLDiffStrings |63|   128 | avgt |   5 
|  8.823|± 0.007 | us/op
StringCompare.compareLLDiffStringsWithLdp  | 7|   128 | avgt |   5 
|  3.867|± 0.001 | us/op
StringCompare.compareLLDiffStringsWithLdp  |15|   128 | avgt |   5 
|  5.571|± 0.756 | us/op
StringCompare.compareLLDiffStringsWithLdp  |31|   128 | avgt |   5 
|  5.408|± 0.001 | us/op
StringCompare.compareLLDiffStringsWithLdp  |47|   128 | avgt |   5 
|  6.896|± 0.825 | us/op
StringCompare.compareLLDiffStringsWithLdp  |63|   128 | avgt |   5 
|  6.787|± 0.001 | us/op
StringCompare.compareLLDiffStringsWithRefactor | 7|   128 | avgt |   5 
|  3.481|± 0.001 | us/op
StringCompare.compareLLDiffStringsWithRefactor |15|   128 | avgt |   5 
| 10.023|± 0.012 | us/op
StringCompare.compareLLDiffStringsWithRefactor |31|   128 | avgt |   5 
|  5.627|± 0.017 | us/op
StringCompare.compareLLDiffStringsWithRefactor |47|   128 | avgt |   5 
| 13.369|± 0.544 | us/op
StringCompare.compareLLDiffStringsWithRefactor |63|   128 | avgt |   5 
|  8.382|± 0.988 | us/op

-

PR: https://git.openjdk.java.net/jdk/pull/4722


Re: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

2021-07-09 Thread Andrew Haley
On Thu, 8 Jul 2021 11:50:36 GMT, Wang Huang  wrote:

> Dear all, 
> Can you do me a favor to review this patch. This patch use `ldp` to 
> implement String.compareTo.
>
> * We add a JMH test case 
> * Here is the result of this test case
>  
> Benchmark|(size)| Mode| Cnt|Score | Error  |Units 
> -|--|-||--||-
> StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±0.005|us/op
> StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±0.006|us/op
> StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±0.011|us/op
> StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±0.12 |us/op
> StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±0.007|us/op
> StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±0.006|us/op
> StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±0.417|us/op
> StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±0.041|us/op
> StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001| ± 
> 0.121|us/op
> StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±0.003|us/op
> StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±0.201|us/op
> StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±0.004|us/op
> StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±1.342|us/op
> StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±0.581|us/op
> StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±1.775|us/op
> StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±0.01 |us/op
> StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±0.006|us/op
> StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±0.011|us/op
> StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±0.008|us/op
> StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±0.017|us/op
> StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±0.011|us/op
> StringCompare.compareUU   |  181 | avgt| 5  |39.31| ± 
> 0.016|us/op
> StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±0.392|us/op
> StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±0.008|us/op
> StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±0.158|us/op
> StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±0.024|us/op
> StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±0.006|us/op
> StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±0.434|us/op
> StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±0.016|us/op
> StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±0.017|us/op
> StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±3.5  |us/op
> 
> From this table, we can see that in most cases, our patch is better than old 
> one.
> 
> Thank you for your review. Any suggestions are welcome.

I'm quite tempted to approve this. It looks generally better, simpler, and 
easier to understand than what we have today. However, the improvement isn't 
great, and I suspect is mostly because of the reduction in traffic between Base 
and Vector registers.
What happens if you rewrite `compare_string_16_bytes_same()` to use `ldp` ?

-

PR: https://git.openjdk.java.net/jdk/pull/4722


RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

2021-07-08 Thread Wang Huang
Dear all, 
Can you do me a favor to review this patch. This patch use `ldp` to 
implement String.compareTo.
   
* We add a JMH test case 
* Here is the result of this test case
 
Benchmark  |(size)| Mode| Cnt|Score | Error  |Units 
-|--|-||--||-
StringCompare.compareLL   |  64  | avgt| 5  |7.992 | ±  0.005|us/op
StringCompare.compareLL   |  72  | avgt| 5  |15.029| ±  0.006|us/op
StringCompare.compareLL   |  80  | avgt| 5  |14.655| ±  0.011|us/op
StringCompare.compareLL   |  91  | avgt| 5  |16.363| ±  0.12 |us/op
StringCompare.compareLL   |  101 | avgt| 5  |16.966| ±  0.007|us/op
StringCompare.compareLL   |  121 | avgt| 5  |19.276| ±  0.006|us/op
StringCompare.compareLL   |  181 | avgt| 5  |19.002| ±  0.417|us/op
StringCompare.compareLL   |  256 | avgt| 5  |24.707| ±  0.041|us/op
StringCompare.compareLLWithLdp|  64  | avgt| 5  |8.001  | ± 0.121|us/op
StringCompare.compareLLWithLdp|  72  | avgt| 5  |11.573| ±  0.003|us/op
StringCompare.compareLLWithLdp|  80  | avgt| 5  |6.861 | ±  0.004|us/op
StringCompare.compareLLWithLdp|  91  | avgt| 5  |12.774| ±  0.201|us/op
StringCompare.compareLLWithLdp|  101 | avgt| 5  |8.691 | ±  0.004|us/op
StringCompare.compareLLWithLdp|  121 | avgt| 5  |11.091| ±  1.342|us/op
StringCompare.compareLLWithLdp|  181 | avgt| 5  |14.64 | ±  0.581|us/op
StringCompare.compareLLWithLdp|  256 | avgt| 5  |25.879| ±  1.775|us/op
StringCompare.compareUU   |  64  | avgt| 5  |13.476| ±  0.01 |us/op
StringCompare.compareUU   |  72  | avgt| 5  |15.078| ±  0.006|us/op
StringCompare.compareUU   |  80  | avgt| 5  |23.512| ±  0.011|us/op
StringCompare.compareUU   |  91  | avgt| 5  |24.284| ±  0.008|us/op
StringCompare.compareUU   |  101 | avgt| 5  |20.707| ±  0.017|us/op
StringCompare.compareUU   |  121 | avgt| 5  |29.302| ±  0.011|us/op
StringCompare.compareUU   |  181 | avgt| 5  |39.31  | ± 0.016|us/op
StringCompare.compareUU   |  256 | avgt| 5  |54.592| ±  0.392|us/op
StringCompare.compareUUWithLdp|  64  | avgt| 5  |16.389| ±  0.008|us/op
StringCompare.compareUUWithLdp|  72  | avgt| 5  |10.71 | ±  0.158|us/op
StringCompare.compareUUWithLdp|  80  | avgt| 5  |11.488| ±  0.024|us/op
StringCompare.compareUUWithLdp|  91  | avgt| 5  |13.412| ±  0.006|us/op
StringCompare.compareUUWithLdp|  101 | avgt| 5  |16.245| ±  0.434|us/op
StringCompare.compareUUWithLdp|  121 | avgt| 5  |16.597| ±  0.016|us/op
StringCompare.compareUUWithLdp|  181 | avgt| 5  |27.373| ±  0.017|us/op
StringCompare.compareUUWithLdp|  256 | avgt| 5  |41.74 | ±  3.5  |us/op

>From this table, we can see that in most cases, our patch is better than old 
>one.

Thank you for your review. Any suggestions are welcome.

-

Commit messages:
 - 8268231: Aarch64: Use ldp in intrinsics for String.compareTo

Changes: https://git.openjdk.java.net/jdk/pull/4722/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4722&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8268231
  Stats: 259 lines in 3 files changed: 255 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4722.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4722/head:pull/4722

PR: https://git.openjdk.java.net/jdk/pull/4722