Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v10]

2023-03-28 Thread Claes Redestad
On Mon, 27 Mar 2023 18:37:12 GMT, Jim Laskey  wrote:

>> Add the ability to repeatedly append char and CharSequence data to 
>> StringBuilder/StringBuffer.
>
> Jim Laskey has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Use Arrays.fill

Looks ok. I don't have a strong opinion on insert-repeat. 

I think tests that check for illegal code points should use a legal repeat 
value.

-

Marked as reviewed by redestad (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/12728#pullrequestreview-1360764390


Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v10]

2023-03-28 Thread Claes Redestad
On Mon, 27 Mar 2023 18:37:12 GMT, Jim Laskey  wrote:

>> Add the ability to repeatedly append char and CharSequence data to 
>> StringBuilder/StringBuffer.
>
> Jim Laskey has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Use Arrays.fill

test/jdk/java/lang/StringBuilder/StringBuilderRepeat.java line 180:

> 178: 
> 179: try {
> 180: sb.repeat(0x10 + 1, -1);

Should these tests that test invalid codepoints use a valid repeat count as to 
verify the IAE is correctly thrown for the out-of-bound codepoint? Tests where 
both params are illegal are fine, too, of course.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12728#discussion_r1150396671


Re: RFR: 8302818: Optimize wrapper sets and immutable sets of Enums [v5]

2023-03-21 Thread Claes Redestad
On Tue, 21 Feb 2023 03:39:46 GMT, Tingjun Yuan  wrote:

>> Currently, the two subclasses of `java.util.EnumSet` optimize bulk 
>> operations when the argument is also a `EnumSet`, but there is no such 
>> optimization for wrapper sets (returned by `Collections.unmodifiableSet`, 
>> `Collections.synchronizedSet`, etc.) and immutable sets (returned by 
>> `Set.of` methods) of `Enum`s.
>> 
>> This PR introduces optimization classes for these situations.  No public 
>> APIs are changed.
>
> Tingjun Yuan has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Set.copyOf: need defensive copy

If this level of complexity is indeed needed to get whatever improvement you're 
after then I don't see how this can be worth its weight. Microbenchmarking 
might help support your case here, but assessing the potential performance 
costs from gradually increasing the number of classes floating around at 
various call sites in arbitrary applications is hard. Thus it is something we 
need to be very careful not to do without solid evidence.

-

PR Comment: https://git.openjdk.org/jdk/pull/12498#issuecomment-1478731354


Re: RFR: 8302818: Optimize wrapper sets and immutable sets of Enums [v5]

2023-03-21 Thread Claes Redestad
On Tue, 21 Feb 2023 03:39:46 GMT, Tingjun Yuan  wrote:

>> Currently, the two subclasses of `java.util.EnumSet` optimize bulk 
>> operations when the argument is also a `EnumSet`, but there is no such 
>> optimization for wrapper sets (returned by `Collections.unmodifiableSet`, 
>> `Collections.synchronizedSet`, etc.) and immutable sets (returned by 
>> `Set.of` methods) of `Enum`s.
>> 
>> This PR introduces optimization classes for these situations.  No public 
>> APIs are changed.
>
> Tingjun Yuan has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Set.copyOf: need defensive copy

I'm wary of the impact of adding new wrapper classes. It may make the factory 
methods slightly slower due additional checks, but also risks increasing the 
number of classes at various call-sites which might upset call site inlining. 

An alternative design which would avoid adding more classes could be to add 
package-private accessors to the existing `Unmodifiable/Synchronized` wrapper 
classes so that `EnumSet/-Map` can retrieve the underlying set of an 
unmodifiable or synchronized `Set` or `Map` and then use it directly for these 
bulk operations. Then you'd contain the additional overhead to `EnumSet/-Map`.

-

PR Comment: https://git.openjdk.org/jdk/pull/12498#issuecomment-1478035549


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Claes Redestad
On Wed, 15 Mar 2023 12:07:12 GMT, Sergey Tsypanov  wrote:

>> src/java.base/share/classes/java/lang/CharacterData.java line 72:
>> 
>>> 70: 
>>> 71: static final CharacterData of(int ch) {
>>> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path
>> 
>> Maybereducing to a single branch with a mask op helps further? Or maybe the 
>> JIT already effectively does that:
>> 
>> `if (ch && 0xFF00 == 0) {`
>
> Btw, I think we can do the same for `StringLatin1.canEncode()`

It seems reasonable to keep these two in sync, yes. (`CharacterData.of` could 
even call into `StringLatin1.canEncode`, unless that's cause for some 
performance anomaly)

-

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: 8304245: Speed up CharacterData.of by avoiding bit shifting in the latin1 fast-path test

2023-03-15 Thread Claes Redestad
On Wed, 15 Mar 2023 11:26:21 GMT, Eirik Bjorsnos  wrote:

> By avoiding a bit shift operation for the latin1 fast-path test, we can speed 
> up the `java.lang.CharacterData.of` method by ~25% for latin1 code points.
> 
> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
> (especially for Latin1 code points):
> 
> This method is called frequently by various property-determining methods in 
> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should expect 
> improvements for all these methods.
> 
> Performance is tested using the `Characters.isDigit` benchmark using the 
> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
> in CharacterData00):
> 
> Baseline:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.870 ± 0.011  ns/op
> Characters.isDigit 1632  avgt   15  2.168 ± 0.017  ns/op
> 
> PR:
> 
> 
> Benchmark   (codePoint)  Mode  Cnt  Score   Error  Units
> Characters.isDigit   48  avgt   15  0.654 ± 0.007  ns/op
> Characters.isDigit 1632  avgt   15  2.032 ± 0.019  ns/op

Nice one!

src/java.base/share/classes/java/lang/CharacterData.java line 72:

> 70: 
> 71: static final CharacterData of(int ch) {
> 72: if (ch >= 0 && ch <= 0xFF) { // fast-path

Maybereducing to a single branch with a mask op helps further? Or maybe the JIT 
already effectively does that:

`if (ch && 0xFF00 == 0) {`

-

Marked as reviewed by redestad (Reviewer).

PR: https://git.openjdk.org/jdk/pull/13040


Re: RFR: JDK-8302323 Add repeat methods to StringBuilder/StringBuffer [v8]

2023-03-14 Thread Claes Redestad
On Fri, 3 Mar 2023 19:04:22 GMT, Jim Laskey  wrote:

>> Add the ability to repeatedly append char and CharSequence data to 
>> StringBuilder/StringBuffer.
>
> Jim Laskey has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Expand test for StringBuffer and illegal code points

It seems reasonable to consider analogous `insert` methods. Perhaps a more 
consistent naming scheme would be `appendRepeated` and `insertRepeated`? (The 
precedent for `repeat` in `String::repeat` is there, but is kind of weak since 
it's clear in that case that we're not mutating internal state)

src/java.base/share/classes/java/lang/AbstractStringBuilder.java line 1832:

> 1830: if (isLatin1 && StringLatin1.canEncode(c)) {
> 1831: byte b = (byte)c;
> 1832: for (int index = this.count; index < limit; index++) {

This loop could even be replaced with `Arrays.fill(value, this.count, limit, 
b)` - a plausible candidate for intrinsification?  There's an added range check 
in that method, however, but that shouldn't be too hard for the JIT to 
eliminate.

-

PR: https://git.openjdk.org/jdk/pull/12728


Re: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments

2023-03-03 Thread Claes Redestad
On Fri, 3 Mar 2023 09:38:13 GMT, Pavel Rappo  wrote:

>> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java 
>> line 257:
>> 
>>> 255: 
>>> 256: /**
>>> 257:  * @return true iff the BSM method type exactly matches
>> 
>> I assume “iff” should “if”?
>
> Here and elsewhere in this file "iff" might mean [if and only 
> if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. 
> (FWIW, there are a few hundred occurrences of the word "iff" in src.)
> 
> @cl4es (Claes Redestad), as the author of those lines would you like to chime 
> in?
> 
> Since Claes might read this, I note that when I changed unsupported `{@see}` 
> to `{@link}` thoughtout this file, my IDE could not resolve one of the links: 
> `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)`
> 
> While there's a similarly-name method with slightly different parameters, I 
> refrained from using it:
> `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`.

Yes, iff means if-and-only-if and is used for extra precision in formal logic, 
mathematics. As @pavelrappo points out it's a relatively common occurrence in 
the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit 
pretentious, but mostly a terse way to say "return true if the BSM method type 
exactly matches X, otherwise false".

The broken link stems from the fact that the method I was targeting (a way to 
use condy for lambda proxy singletons rather than a `MethodHandle.constant`) 
was never integrated. We'll look at either getting that done (@briangoetz 
suggested the time might be ready for it) or remove this currently pointless 
static bootstrap specialization test.

-

PR: https://git.openjdk.org/jdk/pull/12826


Re: RFR: 8301119: Support for GB18030-2022 [v2]

2023-02-23 Thread Claes Redestad
On Thu, 23 Feb 2023 08:32:29 GMT, Claes Redestad  wrote:

>> `Charset` class is initialized *before* system properties are set up, in 
>> order to check the JNU encoding (used for file path name) is a supported 
>> charset or not. In some OS environments, GB18030 is the native encoding so 
>> we need to avoid checking the system property in such a case.
>
> `@Stable` semantics are still fuzzy to me but the rule I've adhered to is 
> that back to back stores to the field - if unavoidable - needs to be 
> idempotent since the JIT (or AOT) may record any non-null value as a compile 
> time constant at any time.
> 
> I'd write this to not update the static field if initLevel() < 1. Such calls 
> should be rare and only happen once on a system that has GB18030 as their 
> native encoding.

Scratch that: as it seems to be important that we don't switch after startup 
then what this code is really reaching for is `static final` field semantics. 
Since `StandardCharsets` might be loaded very early a holder class pattern 
might be necessary:


isGB18030_2000() { return GB18030Properties.GB18030_2000; }

private static class GB18030Properties {
private static final GB18030_2000 = init();
private static boolean init() {
if (VM.initLevel() < 1) {
// Cannot get the system property yet. Assumes non-2000
return false;
}
return 
"2000".equals(GetPropertyAction.privilegedGetProperty("jdk.charset.GB18030"));
}
}

-

PR: https://git.openjdk.org/jdk/pull/12518


Re: RFR: 8301119: Support for GB18030-2022 [v2]

2023-02-23 Thread Claes Redestad
On Wed, 22 Feb 2023 17:52:01 GMT, Naoto Sato  wrote:

>>> curious - what scenario triggers this call at initLevel < 1 ?
>> 
>> It's not supported, but it is possible that someone might run with 
>> -Dfile.encoding=GB18030, in which case the default charset is used before 
>> the system properties are initialized in initPhase1. Checking the init level 
>> breaks the circularity, the only downside is that can't switch to 
>> GB18030-2000 at the same time.
>
> `Charset` class is initialized *before* system properties are set up, in 
> order to check the JNU encoding (used for file path name) is a supported 
> charset or not. In some OS environments, GB18030 is the native encoding so we 
> need to avoid checking the system property in such a case.

`@Stable` semantics are still fuzzy to me but the rule I've adhered to is that 
back to back stores to the field - if unavoidable - needs to be idempotent 
since the JIT (or AOT) may record any non-null value as a compile time constant 
at any time.

I'd write this to not update the static field if initLevel() < 1. Such calls 
should be rare and only happen once on a system that has GB18030 as their 
native encoding.

-

PR: https://git.openjdk.org/jdk/pull/12518


Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v10]

2023-02-22 Thread Claes Redestad
On Wed, 22 Feb 2023 07:11:16 GMT, Eirik Bjorsnos  wrote:

>> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 
>> 'the oldest ASCII trick in the book'.
>> 
>> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two 
>> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is 
>> updated to use `equalsIgnoreCase`
>> 
>> To verify the correctness of `equalsIgnoreCase`, a new test is added  to 
>> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 
>> code point pairs have an `equalsIgnoreCase` consistent with 
>> Character.toUpperCase, Character.toLowerCase.
>> 
>> Performance is tested for matching and mismatching cases of code point pairs 
>> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results 
>> in the first comment below.
>
> Eirik Bjorsnos has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Replace 'oldest ASCII trick in the book' use in toUpperCase, toLowerCase 
> with "by removing (setting) a single bit"
>  - Align local variable naming in toLowerCase, toUpperCase with 
> equalsIgnoreCase by using 'lower' and 'upper'

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/12632


Integrated: 8302863: Speed up String::encodeASCII using countPositives

2023-02-21 Thread Claes Redestad
On Sat, 18 Feb 2023 23:26:08 GMT, Claes Redestad  wrote:

> When encoding Strings to US-ASCII we can speed up the happy path 
> significantly by using `StringCoding.countPositives` as a speculative check 
> for whether there are any chars that needs to be replaced by `'?'`. Once a 
> non-ASCII char is encountered we fall back to the slow loop and replace as 
> needed.
> 
> An alternative could be unrolling or using a byte array VarHandle, as 
> show-cased by Brett Okken here: 
> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html 
> Having to replace chars with `?` is essentially an encoding error so it might 
> be safe to assume this case is exceptional in practice.

This pull request has now been integrated.

Changeset: 92dfa117
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/92dfa1175e4898fc491115e004380780b6862473
Stats: 24 lines in 1 file changed: 11 ins; 2 del; 11 mod

8302863: Speed up String::encodeASCII using countPositives

Reviewed-by: alanb

-

PR: https://git.openjdk.org/jdk/pull/12640


Re: RFR: 8302863: Speed up String::encodeASCII using countPositives

2023-02-21 Thread Claes Redestad
On Mon, 20 Feb 2023 21:40:41 GMT, Brett Okken  wrote:

>> When encoding Strings to US-ASCII we can speed up the happy path 
>> significantly by using `StringCoding.countPositives` as a speculative check 
>> for whether there are any chars that needs to be replaced by `'?'`. Once a 
>> non-ASCII char is encountered we fall back to the slow loop and replace as 
>> needed.
>> 
>> An alternative could be unrolling or using a byte array VarHandle, as 
>> show-cased by Brett Okken here: 
>> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html 
>> Having to replace chars with `?` is essentially an encoding error so it 
>> might be safe to assume this case is exceptional in practice.
>
> For the happy path of no encoding failures, this "trivial" change is a great 
> improvement.

Thanks for reviewing - and thanks @bokken for inspiring this change.

-

PR: https://git.openjdk.org/jdk/pull/12640


Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v7]

2023-02-21 Thread Claes Redestad
On Tue, 21 Feb 2023 11:14:13 GMT, Eirik Bjorsnos  wrote:

>> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 
>> 'the oldest ASCII trick in the book'.
>> 
>> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two 
>> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is 
>> updated to use `equalsIgnoreCase`
>> 
>> To verify the correctness of `equalsIgnoreCase`, a new test is added  to 
>> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 
>> code point pairs have an `equalsIgnoreCase` consistent with 
>> Character.toUpperCase, Character.toLowerCase.
>> 
>> Performance is tested for matching and mismatching cases of code point pairs 
>> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results 
>> in the first comment below.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove whitespace following '('

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/12632


Re: RFR: 8302877: Speed up latin1 case conversions [v2]

2023-02-21 Thread Claes Redestad
On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos  wrote:

>> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase 
>> for latin1 code points by applying the 'oldest ASCII trick in the book'.
>> 
>> This takes advantage of the fact that latin1 uppercase code points are 
>> always 0x20 lower than their lowercase (with the exception of two code 
>> points which uppercase out of latin1).
>> 
>> To verify the correctness of the new implementation, the test 
>> `Latin1CaseConversion` is added with an exhaustive verification of 
>> toUpperCase/toLowerCase for all latin1 code points.
>> 
>> The implementation needs to balance the performance of the various ranges in 
>> latin1. An effort has been made to favour operations on ASCII code points, 
>> without causing excessive regression for higher code points.
>> 
>> Performance is benchmarked for 7 chosen sample code points, each 
>> representing a range or a special-case.  Results in the first comment.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Spell fix for 'exhaustive' in comments in sun/text/resources

Looks good. Some nits inline

src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 
142:

> 140: }
> 141: int l = ch | 0x20; // Lowercase using 'oldest ASCII trick in the 
> book'
> 142: if ( l <= 'z' // In range a-z

Suggestion:

if (l <= 'z' // In range a-z

test/micro/org/openjdk/bench/java/lang/Characters.java line 92:

> 90: @Measurement(iterations = 5, time = 1)
> 91: @Fork(3)
> 92: public static class Latin1CaseConversions {

Not sure if qualifying this as "Latin1" is necessary, even though that's what 
you've focused on for this PR. We could easily add some codePoints outside of 
the latin1 range (now or later) without changing the test. 

While having a switch with some readable names is a nice touch I think we 
should additionally allow integer codePoint as-is to keep it in line with the 
outer class, e.g. `default -> Integer.parseInt(codePoint);`

-

Marked as reviewed by redestad (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12623


Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v4]

2023-02-20 Thread Claes Redestad
On Mon, 20 Feb 2023 16:16:45 GMT, Eirik Bjorsnos  wrote:

>> src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 
>> 170:
>> 
>>> 168:  * @return true if the two bytes are considered equals ignoring 
>>> case in latin1
>>> 169:  */
>>> 170:  static boolean equalsIgnoreCase(byte b1, byte b2) {
>> 
>> Perhaps put this in `CharacterDataLatin1`, keeping it close to 
>> toLowerCase/toUpperCase that you're changing to use similar logic with 
>> #12623 
>> 
>> If you apply #12623 first - how much difference does this make on the micro 
>> you're adding with this PR?
>
> Is it not already in CharacterDataLatin1?
> 
> Here is a comparison of relying on improvements in 
> `CharacterDataLatin1.toUpperCase/toLowerCase` only vs. using 
> `CharacterDataLatin1.equalsIgnoreCase`:
> 
> Character.toUpperCase/toLowerCase only:
> 
> 
> Benchmark  (codePoints)  (size)  Mode  Cnt
>  ScoreError  Units
> RegionMatchesIC.Latin1.regionMatchesIC  ascii-match1024  avgt   15  
> 1310.582 ± 84.777  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC   ascii-mismatch1024  avgt   15
>  4.547 ±  0.545  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC number-match1024  avgt   15   
> 686.947 ± 11.850  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC  number-mismatch1024  avgt   15
>  3.836 ±  0.634  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC   lat1-match1024  avgt   15  
> 2107.219 ± 17.662  ns/op
> RegionMatchesIC.Latin1.regionMatchesIClat1-mismatch1024  avgt   15
>  4.924 ±  0.829  ns/op
> 
> 
> CharacterDataLatin1.equalsIgnoreCase:
> 
> 
> Benchmark  (codePoints)  (size)  Mode  Cnt
>  ScoreError  Units
> RegionMatchesIC.Latin1.regionMatchesIC  ascii-match1024  avgt   15   
> 742.467 ± 34.490  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC   ascii-mismatch1024  avgt   15
>  3.960 ±  0.046  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC number-match1024  avgt   15   
> 361.158 ± 37.096  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC  number-mismatch1024  avgt   15
>  4.039 ±  0.521  ns/op
> RegionMatchesIC.Latin1.regionMatchesIC   lat1-match1024  avgt   15  
> 1158.091 ± 41.617  ns/op
> RegionMatchesIC.Latin1.regionMatchesIClat1-mismatch1024  avgt   15
>  4.358 ±  0.123  ns/op

Oops, I lost context and thought this was in `StringLatin1`.

Thanks for running the numbers with #12623. Looks like you're getting big 
enough of an improvement on top.

-

PR: https://git.openjdk.org/jdk/pull/12632


Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI [v4]

2023-02-20 Thread Claes Redestad
On Mon, 20 Feb 2023 14:45:09 GMT, Eirik Bjorsnos  wrote:

>> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 
>> 'the oldest ASCII trick in the book'.
>> 
>> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two 
>> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is 
>> updated to use `equalsIgnoreCase`
>> 
>> To verify the correctness of `equalsIgnoreCase`, a new test is added  to 
>> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 
>> code point pairs have an `equalsIgnoreCase` consistent with 
>> Character.toUpperCase, Character.toLowerCase.
>> 
>> Performance is tested for matching and mismatching cases of code point pairs 
>> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results 
>> in the first comment below.
>
> Eirik Bjorsnos has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Add @bug tag to EqualsIgnoreCase test for correct issue JDK-8302871
>  - Add @bug tag to EqualsIgnoreCase test for JDK-8302877

src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 
170:

> 168:  * @return true if the two bytes are considered equals ignoring case 
> in latin1
> 169:  */
> 170:  static boolean equalsIgnoreCase(byte b1, byte b2) {

Perhaps put this in `CharacterDataLatin1`, keeping it close to 
toLowerCase/toUpperCase that you're changing to use similar logic with #12623 

If you apply #12623 first - how much difference does this make on the micro 
you're adding with this PR?

-

PR: https://git.openjdk.org/jdk/pull/12632


Re: Speed up latin1 case folding

2023-02-20 Thread Claes Redestad
RFE filed: https://bugs.openjdk.org/browse/JDK-8302877

/Claes

17 feb. 2023 kl. 18:38 skrev Eirik Bjørsnøs 
mailto:eir...@gmail.com>>:

Hi,

The following PR suggests we can speed up Character.toUpperCase and 
Character.toLowerCase for latin1 code points by applying 'the oldest ASCII 
trick in the book':

https://github.com/openjdk/jdk/pull/12623

Thanks,
Eirik.



Re: Speed up StringLatin1.regionMatchesCI

2023-02-20 Thread Claes Redestad
RFE filed: https://bugs.openjdk.org/browse/JDK-8302871

/Claes

18 feb. 2023 kl. 10:28 skrev Eirik Bjørsnøs 
mailto:eir...@gmail.com>>:

Hi,

The following PR suggests we can speed up StringLatin1.regionMatchesCI by 
applying 'the oldest ASCII trick in the book':

https://github.com/openjdk/jdk/pull/12632

Thanks,
Eirik.



Re: Speed up StringLatin1.regionMatchesCI_UTF16

2023-02-20 Thread Claes Redestad
RFE filed: https://bugs.openjdk.org/browse/JDK-8302872

/Claes

18 feb. 2023 kl. 19:58 skrev Eirik Bjørsnøs 
mailto:eir...@gmail.com>>:

Hi,

This PR continues the effort to speed up case-insensitive string comparisons, 
this time tackling comparison of latin1-coded strings with utf16-coded strings:

https://github.com/openjdk/jdk/pull/12637

This builds on top of #12632, it makes sense to review that one first.

Thanks,
Eirik.



Re: RFR: 8302863: Speed up String::encodeASCII using countPositives

2023-02-20 Thread Claes Redestad
On Sat, 18 Feb 2023 23:26:08 GMT, Claes Redestad  wrote:

> When encoding Strings to US-ASCII we can speed up the happy path 
> significantly by using `StringCoding.countPositives` as a speculative check 
> for whether there are any chars that needs to be replaced by `'?'`. Once a 
> non-ASCII char is encountered we fall back to the slow loop and replace as 
> needed.
> 
> An alternative could be unrolling or using a byte array VarHandle, as 
> show-cased by Brett Okken here: 
> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html 
> Having to replace chars with `?` is essentially an encoding error so it might 
> be safe to assume this case is exceptional in practice.

Baseline:

Benchmark  (charsetName)  Mode  Cnt  Score Error  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  26626,025 ± 448,307  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt5 33,336 ±   2,032  
ns/op


Patch:

Benchmark  (charsetName)  Mode  Cnt ScoreError  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  5492,985 ± 40,066  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt528,545 ±  4,883  
ns/op

-

PR: https://git.openjdk.org/jdk/pull/12640


Re: RFR: 8302863: Speed up String::encodeASCII using countPositives

2023-02-20 Thread Claes Redestad
On Sun, 19 Feb 2023 07:24:30 GMT, David Schlosnagle  wrote:

>> When encoding Strings to US-ASCII we can speed up the happy path 
>> significantly by using `StringCoding.countPositives` as a speculative check 
>> for whether there are any chars that needs to be replaced by `'?'`. Once a 
>> non-ASCII char is encountered we fall back to the slow loop and replace as 
>> needed.
>> 
>> An alternative could be unrolling or using a byte array VarHandle, as 
>> show-cased by Brett Okken here: 
>> https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html 
>> Having to replace chars with `?` is essentially an encoding error so it 
>> might be safe to assume this case is exceptional in practice.
>
> src/java.base/share/classes/java/lang/String.java line 976:
> 
>> 974: private static byte[] encodeASCII(byte coder, byte[] val) {
>> 975: if (coder == LATIN1) {
>> 976: byte[] dst = Arrays.copyOf(val, val.length);
> 
> Given the tweaks in https://git.openjdk.org/jdk/pull/12613 should this use 
> `val.clone()` (would skip the length check)
> 
> Suggestion:
> 
> byte[] dst = val.clone();

Yeah, probably makes sense. On that note I found that `val.clone()` 
underperform `Arrays.copyOf(val, val.length)` in C1 compiled code: 
https://bugs.openjdk.org/browse/JDK-8302850 - while this shouldn't affect peak 
performance it might be cause for a warmup regression.

> src/java.base/share/classes/java/lang/String.java line 982:
> 
>> 980: if (dst[i] < 0) {
>> 981: dst[i] = '?';
>> 982: }
> 
> I'm curious if using countPositives (and vectorization) to scan forward would 
> be valuable for long (mostly ASCII) strings or if the method call 
> overhead/non-constant stride is not a win for shorter strings or heavily 
> non-ascii inputs. 
> 
> Suggestion:
> 
> for (int i = positives; i < dst.length; i = 
> StringCoding.countPositives(dst, i + 1, dst.length - i);) {
> if (dst[i] < 0) {
> dst[i] = '?';
> }

There's some overhead doing countPositives calls, and doing it in a loop might 
provoke poor worst case performance. Im sure you can find inputs for which this 
is a win, though.

-

PR: https://git.openjdk.org/jdk/pull/12640


RFR: 8302863: Speed up String::encodeASCII using countPositives

2023-02-20 Thread Claes Redestad
When encoding Strings to US-ASCII we can speed up the happy path significantly 
by using `StringCoding.countPositives` as a speculative check for whether there 
are any chars that needs to be replaced by `'?'`. Once a non-ASCII char is 
encountered we fall back to the slow loop and replace as needed.

An alternative could be unrolling or using a byte array VarHandle, as 
show-cased by Brett Okken here: 
https://mail.openjdk.org/pipermail/core-libs-dev/2023-February/100573.html 
Having to replace chars with `?` is essentially an encoding error so it might 
be safe to assume this case is exceptional in practice.

-

Commit messages:
 - Cleanup, clone
 - Merge branch 'master' into encodeASCII
 - Improve encodeASCII by leveraging countPositives

Changes: https://git.openjdk.org/jdk/pull/12640/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12640=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302863
  Stats: 24 lines in 1 file changed: 11 ins; 2 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/12640.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12640/head:pull/12640

PR: https://git.openjdk.org/jdk/pull/12640


Re: String encodeASCII

2023-02-20 Thread Claes Redestad
Unsafe might be similarly tricky dependency-wise, but perhaps less so. Either 
solution might work fine if we extract the code for encoding to a utility class 
that isn’t initialized eagerly with String.class itself.

I’ll file an RFE and get the ”trivial” fix to use StringCoding.countPositives 
in to establish a better baseline. I’m sure there are ways to beat this, for 
example with a fused loop. An unrolled and/or VH-based solution like you’ve 
proposed could at the very least be used to speedup the case of Strings for 
which we need to replace with ’?’.

/Claes

19 feb. 2023 kl. 02:54 skrev Brett Okken 
mailto:brett.okken...@gmail.com>>:

Thanks for taking a look at this.
That is a pretty good improvement with a much smaller change.
I recognize the intricacies of bootstrapping, but have no expertise. Would 
using Unsafe directly be easier? Particularly for shorter strings, doing the 
copying and checking together rather than as separate loops seems to speed 
things up considerably, even for happy-path of no failures.

Brett

On Sat, Feb 18, 2023 at 5:49 PM Claes Redestad 
mailto:claes.redes...@oracle.com>> wrote:
Hi,

general comment: String might be one of the trickier places to add a VarHandle 
dependency, since String is used very early in the bootstrap and depended upon 
by everything else, so I’d expect such a solution would have to navigate 
potential circularity issues carefully. It’d be good to experiment with changes 
to java.lang.String proper to see that the solution that works nice externally 
is or can be made feasible within String.

Specifically on the performance opportunity then while US-ASCII encoding is 
probably on the way out we shouldn’t neglect it.

One way to go about this without pulling VarHandles into String might be to use 
what other encode methods in String does and leverage 
StringCoding.countPositives:

https://github.com/openjdk/jdk/pull/12640<https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/12640__;!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Zp4JhLWU$>

Testing this on the existing StringEncode microbenchmark, shows a promising 
speed-up when the input is ASCII-encodable:

Baseline
Benchmark  (charsetName)  Mode  Cnt  Score Error  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  26626,025 ± 448,307  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt5 33,336 ±   2,032  
ns/op

Patch:
Benchmark  (charsetName)  Mode  Cnt ScoreError  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  5492,985 ± 40,066  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt528,545 ±  4,883  
ns/op

(You might see that this will go back into a scalar loop on encoding failures. 
That loop could still benefit from unrolling or byteArrayViewVarHandle, but I 
think you have a bigger problem in an app than raw performance if you have a 
lot of encoding failures...)

WDYT?

/Claes

18 feb. 2023 kl. 19:36 skrev Brett Okken 
mailto:brett.okken...@gmail.com>>:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981<https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java*L976-L981__;Iw!!ACWV5N9M2RV99hQ!O-1OoPjv-Jw1iGFv2-_q5zTiayoRyLJBcSiQ9J5IEun3kcusUeGRjZzOb_dploCjPh_Kjp6FJXr2cIk7Se5Z_Ook_Ws$>

For String.encodeASCII, with the LATIN1 coder is there any interest in
exploring the performance impacts of utilizing a
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
identify if negative values are present?

A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
values shows times cut in half (or more) for most scenarios with
strings ranging in length from 3 - ~2000.
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
Windows 10 Intel(R) Core(TM) i7-9850H

Hand unrolling the loops shows noted improvement, but does make for
less aesthetically pleasing code.


Benchmark  (nonascii)  (size)  Mode
CntScore Error  Units
AsciiEncodeBenchmark.jdk none   3  avgt
4   15.531 ±   1.122  ns/op
AsciiEncodeBenchmark.jdk none  10  avgt
4   17.350 ±   0.473  ns/op
AsciiEncodeBenchmark.jdk none  16  avgt
4   18.277 ±   0.421  ns/op
AsciiEncodeBenchmark.jdk none  23  avgt
4   20.139 ±   0.350  ns/op
AsciiEncodeBenchmark.jdk none  33  avgt
4   22.008 ±   0.656  ns/op
AsciiEncodeBenchmark.jdk none  42  avgt
4   24.393 ±   1.155  ns/op
AsciiEncodeBenchmark.jdk none 201  avgt
4   55.884 ±   0.645  ns/op
AsciiEncodeBenchmark.jdk none 511  avgt
4  120.817 ±   2.917  ns/op
AsciiEncodeBenchmark.jdk   

Re: RFR: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe

2023-02-20 Thread Claes Redestad
On Sun, 19 Feb 2023 18:41:18 GMT, liach  wrote:

> 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not 
> thread safe

I think of this pattern of reading a to-be-lazily-initialized value into a 
local as simple hygiene, `volatile` or not. The code might seem solid without 
it - but stranger things than eliding a field load has happened. Storing into 
the local variable removes some doubt about how this code will be executed.

-

PR: https://git.openjdk.org/jdk/pull/12643


Re: RFR: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe

2023-02-20 Thread Claes Redestad
On Sun, 19 Feb 2023 18:41:18 GMT, liach  wrote:

> 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not 
> thread safe

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/12643


Re: String encodeASCII

2023-02-18 Thread Claes Redestad
Hi,

general comment: String might be one of the trickier places to add a VarHandle 
dependency, since String is used very early in the bootstrap and depended upon 
by everything else, so I’d expect such a solution would have to navigate 
potential circularity issues carefully. It’d be good to experiment with changes 
to java.lang.String proper to see that the solution that works nice externally 
is or can be made feasible within String.

Specifically on the performance opportunity then while US-ASCII encoding is 
probably on the way out we shouldn’t neglect it.

One way to go about this without pulling VarHandles into String might be to use 
what other encode methods in String does and leverage 
StringCoding.countPositives:

https://github.com/openjdk/jdk/pull/12640

Testing this on the existing StringEncode microbenchmark, shows a promising 
speed-up when the input is ASCII-encodable:

Baseline
Benchmark  (charsetName)  Mode  Cnt  Score Error  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  26626,025 ± 448,307  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt5 33,336 ±   2,032  
ns/op

Patch:
Benchmark  (charsetName)  Mode  Cnt ScoreError  
Units
StringEncode.encodeAsciiLongUS-ASCII  avgt5  5492,985 ± 40,066  
ns/op
StringEncode.encodeAsciiShort   US-ASCII  avgt528,545 ±  4,883  
ns/op

(You might see that this will go back into a scalar loop on encoding failures. 
That loop could still benefit from unrolling or byteArrayViewVarHandle, but I 
think you have a bigger problem in an app than raw performance if you have a 
lot of encoding failures...)

WDYT?

/Claes

18 feb. 2023 kl. 19:36 skrev Brett Okken 
mailto:brett.okken...@gmail.com>>:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L976-L981

For String.encodeASCII, with the LATIN1 coder is there any interest in
exploring the performance impacts of utilizing a
byteArrayViewVarHandle to read/write as longs and utilize a bitmask to
identify if negative values are present?

A simple jmh benchmark covering either 0 or 1 non-ascii (negative)
values shows times cut in half (or more) for most scenarios with
strings ranging in length from 3 - ~2000.
VM version: JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10
Windows 10 Intel(R) Core(TM) i7-9850H

Hand unrolling the loops shows noted improvement, but does make for
less aesthetically pleasing code.


Benchmark  (nonascii)  (size)  Mode
CntScore Error  Units
AsciiEncodeBenchmark.jdk none   3  avgt
4   15.531 ±   1.122  ns/op
AsciiEncodeBenchmark.jdk none  10  avgt
4   17.350 ±   0.473  ns/op
AsciiEncodeBenchmark.jdk none  16  avgt
4   18.277 ±   0.421  ns/op
AsciiEncodeBenchmark.jdk none  23  avgt
4   20.139 ±   0.350  ns/op
AsciiEncodeBenchmark.jdk none  33  avgt
4   22.008 ±   0.656  ns/op
AsciiEncodeBenchmark.jdk none  42  avgt
4   24.393 ±   1.155  ns/op
AsciiEncodeBenchmark.jdk none 201  avgt
4   55.884 ±   0.645  ns/op
AsciiEncodeBenchmark.jdk none 511  avgt
4  120.817 ±   2.917  ns/op
AsciiEncodeBenchmark.jdk none2087  avgt
4  471.039 ±  13.329  ns/op
AsciiEncodeBenchmark.jdkfirst   3  avgt
4   15.794 ±   1.494  ns/op
AsciiEncodeBenchmark.jdkfirst  10  avgt
4   18.446 ±   0.780  ns/op
AsciiEncodeBenchmark.jdkfirst  16  avgt
4   20.458 ±   0.394  ns/op
AsciiEncodeBenchmark.jdkfirst  23  avgt
4   22.934 ±   0.422  ns/op
AsciiEncodeBenchmark.jdkfirst  33  avgt
4   25.367 ±   0.178  ns/op
AsciiEncodeBenchmark.jdkfirst  42  avgt
4   28.620 ±   0.678  ns/op
AsciiEncodeBenchmark.jdkfirst 201  avgt
4   80.250 ±   4.376  ns/op
AsciiEncodeBenchmark.jdkfirst 511  avgt
4  185.518 ±   6.370  ns/op
AsciiEncodeBenchmark.jdkfirst2087  avgt
4  713.213 ±  13.488  ns/op
AsciiEncodeBenchmark.jdk last   3  avgt
4   14.991 ±   0.190  ns/op
AsciiEncodeBenchmark.jdk last  10  avgt
4   18.284 ±   0.317  ns/op
AsciiEncodeBenchmark.jdk last  16  avgt
4   20.591 ±   1.002  ns/op
AsciiEncodeBenchmark.jdk last  23  avgt
4   22.560 ±   0.963  ns/op
AsciiEncodeBenchmark.jdk last  33  avgt
4   25.521 ±   0.554  ns/op
AsciiEncodeBenchmark.jdk last  42  avgt
4   28.484 ±   0.446  ns/op
AsciiEncodeBenchmark.jdk last 201  avgt
4   79.434 ±   2.256  ns/op

Re: RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy

2023-02-18 Thread Claes Redestad
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad  wrote:

> During work on #12453 @schlosna reported that `array.clone()` might 
> underperform `System.arraycopy` in microbenchmarks and I opted to go with 
> `arraycopy` throughout while investigating.
> 
> Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the 
> setups. On aarch the only difference I can observe is that arraycopy seem 
> curiously slow for input size = 0, otherwise no statistically significant 
> difference. All tests ran on builds from JDK mainline and 21-b9.
> 
> Since the reported difference was small and mostly visible on very large 
> arrays I conclude that the maintainability win we get from using `clone()` is 
> preferable. I've added the microbenchmark provided by @schlosna here.

Thanks for reviewing

-

PR: https://git.openjdk.org/jdk/pull/12613


Integrated: 8302315: Examine cost of clone of primitive arrays compared to arraycopy

2023-02-18 Thread Claes Redestad
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad  wrote:

> During work on #12453 @schlosna reported that `array.clone()` might 
> underperform `System.arraycopy` in microbenchmarks and I opted to go with 
> `arraycopy` throughout while investigating.
> 
> Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the 
> setups. On aarch the only difference I can observe is that arraycopy seem 
> curiously slow for input size = 0, otherwise no statistically significant 
> difference. All tests ran on builds from JDK mainline and 21-b9.
> 
> Since the reported difference was small and mostly visible on very large 
> arrays I conclude that the maintainability win we get from using `clone()` is 
> preferable. I've added the microbenchmark provided by @schlosna here.

This pull request has now been integrated.

Changeset: d6716d2e
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/d6716d2e5471ee794df8833430dd3171b565f78e
Stats: 163 lines in 2 files changed: 92 ins; 55 del; 16 mod

8302315: Examine cost of clone of primitive arrays compared to arraycopy

Reviewed-by: alanb

-

PR: https://git.openjdk.org/jdk/pull/12613


Re: RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy

2023-02-17 Thread Claes Redestad
On Fri, 17 Feb 2023 09:58:54 GMT, Claes Redestad  wrote:

> During work on #12453 @schlosna reported that `array.clone()` might 
> underperform `System.arraycopy` in microbenchmarks and I opted to go with 
> `arraycopy` throughout while investigating.
> 
> Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the 
> setups. On aarch the only difference I can observe is that arraycopy seem 
> curiously slow for input size = 0, otherwise no statistically significant 
> difference. All tests ran on builds from JDK mainline and 21-b9.
> 
> Since the reported difference was small and mostly visible on very large 
> arrays I conclude that the maintainability win we get from using `clone()` is 
> preferable. I've added the microbenchmark provided by @schlosna here.

osx-aarch64:

Benchmark (size)  Mode  CntScore   Error  Units
ArrayClone.byteArraycopy   0  avgt   159,517 ± 1,272  ns/op
ArrayClone.byteArraycopy  10  avgt   155,933 ± 0,314  ns/op
ArrayClone.byteArraycopy 100  avgt   154,802 ± 0,234  ns/op
ArrayClone.byteArraycopy1000  avgt   15   24,671 ± 0,437  ns/op
ArrayClone.byteClone   0  avgt   152,417 ± 0,016  ns/op
ArrayClone.byteClone  10  avgt   152,924 ± 0,027  ns/op
ArrayClone.byteClone 100  avgt   154,563 ± 0,050  ns/op
ArrayClone.byteClone1000  avgt   15   24,737 ± 0,262  ns/op
ArrayClone.intArraycopy0  avgt   158,156 ± 2,148  ns/op
ArrayClone.intArraycopy   10  avgt   153,646 ± 0,025  ns/op
ArrayClone.intArraycopy  100  avgt   15   11,430 ± 0,087  ns/op
ArrayClone.intArraycopy 1000  avgt   15  106,174 ± 0,721  ns/op
ArrayClone.intClone0  avgt   152,455 ± 0,159  ns/op
ArrayClone.intClone   10  avgt   153,621 ± 0,013  ns/op
ArrayClone.intClone  100  avgt   15   11,648 ± 0,454  ns/op
ArrayClone.intClone 1000  avgt   15  106,469 ± 1,295  ns/op


linux-x64, sandybridge, avx2:

Benchmark (size)  Mode  CntScoreError  Units
ArrayClone.byteArraycopy   0  avgt   153.321 ±  0.194  ns/op
ArrayClone.byteArraycopy  10  avgt   156.953 ±  0.329  ns/op
ArrayClone.byteArraycopy 100  avgt   15   13.490 ±  0.595  ns/op
ArrayClone.byteArraycopy1000  avgt   15  150.201 ±  3.451  ns/op
ArrayClone.byteClone   0  avgt   155.431 ±  0.252  ns/op
ArrayClone.byteClone  10  avgt   156.370 ±  0.329  ns/op
ArrayClone.byteClone 100  avgt   15   13.561 ±  0.633  ns/op
ArrayClone.byteClone1000  avgt   15  150.300 ±  5.318  ns/op
ArrayClone.intArraycopy0  avgt   153.297 ±  0.226  ns/op
ArrayClone.intArraycopy   10  avgt   157.171 ±  0.354  ns/op
ArrayClone.intArraycopy  100  avgt   15   60.863 ±  1.580  ns/op
ArrayClone.intArraycopy 1000  avgt   15  557.770 ± 15.107  ns/op
ArrayClone.intClone0  avgt   155.373 ±  0.225  ns/op
ArrayClone.intClone   10  avgt   156.965 ±  0.293  ns/op
ArrayClone.intClone  100  avgt   15   61.696 ±  1.983  ns/op
ArrayClone.intClone 1000  avgt   15  552.809 ± 14.358  ns/op

-

PR: https://git.openjdk.org/jdk/pull/12613


RFR: 8302315: Examine cost of clone of primitive arrays compared to arraycopy

2023-02-17 Thread Claes Redestad
During work on #12453 @schlosna reported that `array.clone()` might 
underperform `System.arraycopy` in microbenchmarks and I opted to go with 
`arraycopy` throughout while investigating.

Testing on x86 (SandyBridge, AVX2) I observe no difference at all between the 
setups. On aarch the only difference I can observe is that arraycopy seem 
curiously slow for input size = 0, otherwise no statistically significant 
difference. All tests ran on builds from JDK mainline and 21-b9.

Since the reported difference was small and mostly visible on very large arrays 
I conclude that the maintainability win we get from using `clone()` is 
preferable. I've added the microbenchmark provided by @schlosna here.

-

Commit messages:
 - Examine arraycopy vs clone

Changes: https://git.openjdk.org/jdk/pull/12613/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12613=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302315
  Stats: 163 lines in 2 files changed: 92 ins; 55 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/12613.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12613/head:pull/12613

PR: https://git.openjdk.org/jdk/pull/12613


Integrated: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch

2023-02-15 Thread Claes Redestad
On Mon, 13 Feb 2023 09:59:24 GMT, Claes Redestad  wrote:

> We can improve various String methods such as `startsWith`, `endsWith` and 
> `regionMatches` by leveraging the intrinsified mismatch methods in 
> `ArraysSupport`.

This pull request has now been integrated.

Changeset: 861e3020
Author:    Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/861e302011bb3aaf0c8431c121b58a57b78481e3
Stats: 171 lines in 5 files changed: 107 ins; 44 del; 20 mod

8302163: Speed up various String comparison methods with ArraysSupport.mismatch

Reviewed-by: stsypanov, rriggs, alanb

-

PR: https://git.openjdk.org/jdk/pull/12528


Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch [v2]

2023-02-15 Thread Claes Redestad
On Mon, 13 Feb 2023 16:10:14 GMT, Claes Redestad  wrote:

>> We can improve various String methods such as `startsWith`, `endsWith` and 
>> `regionMatches` by leveraging the intrinsified mismatch methods in 
>> `ArraysSupport`.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Clarify coder shift in startsWith

Thanks for reviewing!

-

PR: https://git.openjdk.org/jdk/pull/12528


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-15 Thread Claes Redestad
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons  wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
>> Units
>> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt  Score 
>> Error   Units
>> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt  ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Last of review comments

No problem! Testing looks good.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-14 Thread Claes Redestad
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons  wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
>> Units
>> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt  Score 
>> Error   Units
>> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt  ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Last of review comments

I've started tier1-5 testing internally. Will let you know if we find any 
issues.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Claes Redestad
On Tue, 14 Feb 2023 15:03:50 GMT, Scott Gibbons  wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2699:
>> 
>>> 2697: __ addptr(dest, 0x18);
>>> 2698: __ subl(length, 0x20);
>>> 2699: __ jcc(Assembler::lessEqual, L_tailProc);
>> 
>> This could be Assembler::less instead of Assembler::lessEqual.
>
> Why?  There is no performance difference and the intent is clear.  Is this 
> just a "style" thing?

I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte chunk 
in inputs that are divisible by 32 (starting from 64), no? Using `less` avoids 
that, while not affecting performance of any other inputs.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Claes Redestad
On Fri, 10 Feb 2023 23:18:47 GMT, Claes Redestad  wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add URL to microbenchmark
>
> Marked as reviewed by redestad (Reviewer).

> @cl4es Can you please initiate the in-depth testing required for this fix? 
> Thanks.

I've checked out the sources and fired off a sanity run of the first couple of 
tiers. Holding off on testing higher tiers until @sviswa7's concerns has been 
resolved.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch [v2]

2023-02-14 Thread Claes Redestad
On Mon, 13 Feb 2023 16:43:12 GMT, Eirik Bjorsnos  wrote:

> Case-insensitive regionMatches could be improved by using 
> ArraysSupport.mismatch to skip over long common substrings:

I considered this but saw regressions similar to what you're getting for size = 
6 and backed off. I think this might be a good future enhancement, with some 
care, but didn't want to encumber this RFE with a case that regresses small 
inputs (which tend to be more common in actual applications).

In similar constructs we have avoided doing the vectorized call in a loop since 
this could regress worst case inputs severely (an adversary example might be 
something like `regionMatches(true, "aa", 0, 
"aAaAaAaAaAaAaAa", 0, 15)` which will call mismatch 8 times on 15 byte of 
input).

-

PR: https://git.openjdk.org/jdk/pull/12528


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v10]

2023-02-13 Thread Claes Redestad
On Mon, 13 Feb 2023 09:59:52 GMT, Claes Redestad  wrote:

>> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
>> copy arrays more efficiently when exactly the whole input array is to be 
>> copied. This helps eliminate range checks and has been verified to help 
>> various String operations. Example:
>> 
>> Baseline
>> 
>> Benchmark(size)  Mode  Cnt   
>> Score   Error  Units
>> StringConstructor.newStringFromArray  7  avgt   15  
>> 16.817 ± 0.369  ns/op
>> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
>> 16.866 ± 0.449  ns/op
>> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
>> 22.198 ± 0.396  ns/op
>> 
>> Patch: 
>> 
>> Benchmark(size)  Mode  Cnt   
>> Score   Error  Units
>> StringConstructor.newStringFromArray  7  avgt   15  
>> 14.666 ± 0.336  ns/op
>> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
>> 14.582 ± 0.288  ns/op
>> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
>> 20.339 ± 0.328  ns/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address review comments

Thanks for reviewng!

I've filed https://bugs.openjdk.org/browse/JDK-8302315 to investigate the 
clone/arraycopy performance discrepancy. Ideally we should be able to just do 
`array.clone()` here and get optimal performance with less code.

-

PR: https://git.openjdk.org/jdk/pull/12453


Integrated: 8301958: Reduce Arrays.copyOf/-Range overheads

2023-02-13 Thread Claes Redestad
On Tue, 7 Feb 2023 13:30:35 GMT, Claes Redestad  wrote:

> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
> copy arrays more efficiently when exactly the whole input array is to be 
> copied. This helps eliminate range checks and has been verified to help 
> various String operations. Example:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> Patch: 
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 14.666 ± 0.336  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 14.582 ± 0.288  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 20.339 ± 0.328  ns/op

This pull request has now been integrated.

Changeset: 1f9c110c
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/1f9c110c1f9ea6f5c3621a25692ce9d7abf245d4
Stats: 209 lines in 2 files changed: 179 ins; 21 del; 9 mod

8301958: Reduce Arrays.copyOf/-Range overheads

Reviewed-by: alanb, smarks

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch

2023-02-13 Thread Claes Redestad
On Mon, 13 Feb 2023 09:59:24 GMT, Claes Redestad  wrote:

> We can improve various String methods such as `startsWith`, `endsWith` and 
> `regionMatches` by leveraging the intrinsified mismatch methods in 
> `ArraysSupport`.

Microbenchmarking shows decent improvements on small data, scaling up to some 
impressive gains on large inputs when vectorization kicks in (~41x on 
regionMatches for size = 1024 on my x64 sandybridge setup, ~34x on m1 in the 
same config)

macosx-aarch64, m1, 21-b9:

Benchmark (size)  (utf16)  Mode  Cnt Score
Error  Units
StringComparisons.endsWith 6 true  avgt   15 5,949 ±  
0,051  ns/op
StringComparisons.endsWith 6false  avgt   15 4,063 ±  
0,038  ns/op
StringComparisons.endsWith15 true  avgt   1510,758 ±  
0,132  ns/op
StringComparisons.endsWith15false  avgt   15 6,487 ±  
0,052  ns/op
StringComparisons.endsWith  1024 true  avgt   15   653,750 ±  
2,835  ns/op
StringComparisons.endsWith  1024false  avgt   15   314,219 ±  
1,264  ns/op
StringComparisons.regionMatches6 true  avgt   1512,460 ±  
0,043  ns/op
StringComparisons.regionMatches6false  avgt   15 6,647 ±  
0,026  ns/op
StringComparisons.regionMatches   15 true  avgt   1530,502 ±  
0,193  ns/op
StringComparisons.regionMatches   15false  avgt   1515,073 ±  
0,030  ns/op
StringComparisons.regionMatches 1024 true  avgt   15  2147,480 ±  
4,622  ns/op
StringComparisons.regionMatches 1024false  avgt   15  1068,787 ± 
14,098  ns/op
StringComparisons.regionMatchesCI  6 true  avgt   1511,680 ±  
0,106  ns/op
StringComparisons.regionMatchesCI  6false  avgt   15 4,577 ±  
0,101  ns/op
StringComparisons.regionMatchesCI 15 true  avgt   1514,422 ±  
0,132  ns/op
StringComparisons.regionMatchesCI 15false  avgt   15 6,904 ±  
0,124  ns/op
StringComparisons.regionMatchesCI   1024 true  avgt   15   273,810 ±  
3,446  ns/op
StringComparisons.regionMatchesCI   1024false  avgt   15   226,040 ±  
2,886  ns/op
StringComparisons.regionMatchesRange   6 true  avgt   1511,896 ±  
0,110  ns/op
StringComparisons.regionMatchesRange   6false  avgt   15 6,044 ±  
0,034  ns/op
StringComparisons.regionMatchesRange  15 true  avgt   1529,508 ±  
0,093  ns/op
StringComparisons.regionMatchesRange  15false  avgt   1514,336 ±  
0,020  ns/op
StringComparisons.regionMatchesRange1024 true  avgt   15  2187,600 ± 
80,285  ns/op
StringComparisons.regionMatchesRange1024false  avgt   15  1105,813 ± 
28,260  ns/op
StringComparisons.startsWith   6 true  avgt   15 5,315 ±  
0,087  ns/op
StringComparisons.startsWith   6false  avgt   15 3,588 ±  
0,020  ns/op
StringComparisons.startsWith  15 true  avgt   15 9,823 ±  
0,144  ns/op
StringComparisons.startsWith  15false  avgt   15 5,963 ±  
0,253  ns/op
StringComparisons.startsWith1024 true  avgt   15   441,854 ±  
9,295  ns/op
StringComparisons.startsWith1024false  avgt   15   224,386 ±  
6,876  ns/op


macosx-aarch64. m1, patched:

Benchmark (size)  (utf16)  Mode  CntScore
Error  Units
StringComparisons.endsWith 6 true  avgt   153,185 ±  
0,063  ns/op
StringComparisons.endsWith 6false  avgt   154,507 ±  
0,447  ns/op
StringComparisons.endsWith15 true  avgt   156,097 ±  
0,142  ns/op
StringComparisons.endsWith15false  avgt   155,736 ±  
0,025  ns/op
StringComparisons.endsWith  1024 true  avgt   15   60,283 ±  
4,109  ns/op
StringComparisons.endsWith  1024false  avgt   15   31,011 ±  
0,080  ns/op
StringComparisons.regionMatches6 true  avgt   153,993 ±  
0,063  ns/op
StringComparisons.regionMatches6false  avgt   154,836 ±  
0,474  ns/op
StringComparisons.regionMatches   15 true  avgt   153,641 ±  
0,012  ns/op
StringComparisons.regionMatches   15false  avgt   152,849 ±  
0,065  ns/op
StringComparisons.regionMatches 1024 true  avgt   15   57,739 ±  
0,748  ns/op
StringComparisons.regionMatches 1024false  avgt   15   30,943 ±  
0,423  ns/op
StringComparisons.regionMatchesCI  6 true  avgt   15   11,729 ±  
0,142  ns/op
StringComparisons.regionMatchesCI  6false  avgt   154,562 ±  
0,125  ns/op
StringComparisons.regionMatchesCI 15 true  avgt   15   14,611 ±  
0,227  ns/op
StringComparisons.regionMatchesCI 15false  avgt   156,970 ±  
0,083  ns/op
StringComparisons.regionMatchesCI   1024 true  

RFR: 8302163: Speed up various String comparison methods with ArraysSupport.mismatch

2023-02-13 Thread Claes Redestad
We can improve various String methods such as `startsWith`, `endsWith` and 
`regionMatches` by leveraging the intrinsified mismatch methods in 
`ArraysSupport`.

-

Commit messages:
 - Remove overlapping micros, extend testing to endsWith, regionCI and some 
minor improvements to String::regionMatches
 - Expand micro coverage
 - Add micro from @eirbjo
 - Revert UTF16.compareValues
 - Add a few micros, apply optimization to StringUTF16.compareValues
 - Speed up various String comparison methods with ArraysSupport.mismatch

Changes: https://git.openjdk.org/jdk/pull/12528/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12528=00
  Issue: https://bugs.openjdk.org/browse/JDK-8302163
  Stats: 170 lines in 5 files changed: 105 ins; 44 del; 21 mod
  Patch: https://git.openjdk.org/jdk/pull/12528.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12528/head:pull/12528

PR: https://git.openjdk.org/jdk/pull/12528


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v10]

2023-02-13 Thread Claes Redestad
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
> copy arrays more efficiently when exactly the whole input array is to be 
> copied. This helps eliminate range checks and has been verified to help 
> various String operations. Example:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> Patch: 
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 14.666 ± 0.336  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 14.582 ± 0.288  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 20.339 ± 0.328  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address review comments

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/350612d4..1031dd85

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=09
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=08-09

  Stats: 31 lines in 1 file changed: 7 ins; 8 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-10 Thread Claes Redestad
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons  wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
>> Units
>> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt  Score 
>> Error   Units
>> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt  ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add URL to microbenchmark

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]

2023-02-09 Thread Claes Redestad
On Thu, 9 Feb 2023 11:46:42 GMT, Eirik Bjorsnos  wrote:

> > In addition to - or instead of - an `equals` shortcut then if coders are 
> > the same we could use `ArraysSupport.mismatch` which should get similar 
> > speed and help more generally.
> 
> ..and if String had (an optimized) mismatch method, then I bet all or most of 
> the comparison methods (equals, compareTo, endsWith, startsWith, 
> regionMatches) could delegate to that :-)

A private mismatch method might make sense. A public method would require 
making a stronger case, I think, e.g. showing use cases a mismatcher would 
solve (elegantly, performantly) that can't be expressed with existing methods.

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]

2023-02-09 Thread Claes Redestad
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, 
> allowing the test to run over all charsets

Oh -- fun! Perhaps `startsWith` doesn't take advantage of the optimized 
intrinsic for `equals`. 

Could be interesting to see if special-casing `startsWith` to call `equals` 
when possible would help:

diff --git 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]

2023-02-09 Thread Claes Redestad
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, 
> allowing the test to run over all charsets

You could skip the `equals`:

if (decoded.startsWith(str)) {
if (decoded.length() == str.length()) {
return Comparison.EXACT_MATCH;
} else if (addSlash
&& 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v7]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 16:36:16 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update the sameHashAndLengthDirLookup to use ASCII example colliding paths, 
> allowing the test to run over all charsets

Marked as reviewed by redestad (Reviewer).

Great! Extending coverage to provoke the issue on most charsets is good, and it 
should guard UTF-8 from regressing too - no?

-

PR: 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v6]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 15:31:15 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Align entry name whitespace

Good! Would it be too much to ask for you to try and construct such a test for 
UTF-8 zip files, too?

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 11:57:16 GMT, Claes Redestad  wrote:

> > > Seems there's a possible real test failure lurking here, might be 
> > > intermittent since it only showed on one platform:
> > 
> > 
> > Did you get this from GHA somehow? Do you happen to know the platform, 
> > timezone and encoding used?
> 
> Yes, clicked through the failing macosx-x64 test suite. Here's a link to the 
> summary which has the failing test logs at the bottom: 
> https://github.com/eirbjo/jdk/actions/runs/4114300244

Seems the failing assert is checking timestamps:

assertEquals((x.getTime() / 2000), y.getTime() / 2000);

... so maybe this is a rare intermittent issue. It's just very suspect though 
that it happens in the test you're changing.

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 11:32:02 GMT, Eirik Bjorsnos  wrote:

> > Seems there's a possible real test failure lurking here, might be 
> > intermittent since it only showed on one platform:
> 
> Did you get this from GHA somehow? Do you happen to know the platform, 
> timezone and encoding used?

Yes, clicked through the failing macosx-x64 test suite. Here's a link to the 
summary which has the failing test logs at the bottom: 
https://github.com/eirbjo/jdk/actions/runs/4114300244

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 08:16:05 GMT, Francesco Nigro  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Minimize, force inline, generalize
>
> test/micro/org/openjdk/bench/java/lang/StringConstructor.java line 40:
> 
>> 38: 
>> 39:   @Param({"0", "7", "64"})
>> 40:   public int size;
> 
> I suggest to add the param `offset` for future experiment: together with 
> `perfasm` it helps to check how different stubs are used and emulate the 
> different branches of the optimized code

Done

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 03:38:24 GMT, David Schlosnagle  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Minimize, force inline, generalize
>
> src/java.base/share/classes/java/util/Arrays.java line 4142:
> 
>> 4140: private static double[] copyOfRangeGeneric(double[] original, int 
>> from, int to) {
>> 4141: checkLength(from, to);
>> 4142: int newLength = to - from;
> 
> Suggestion:
> 
> int newLength = checkLength(from, to);

> I'm also curious if returning the new length from `checkLength` would be 
> worthwhile

I had this arrangement earlier but saw no win from it. These helper methods 
will be aggressively inlined and the JIT sorts it out optimally, so I prefer to 
keep the `check` method zoomed in on its purpose

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v9]

2023-02-08 Thread Claes Redestad
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
> clone arrays when `newLength` or range inputs span the input array. This 
> helps eliminate range checks and has been verified to help various String 
> operations. Example:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> Patch: 
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 14.666 ± 0.336  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 14.582 ± 0.288  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 20.339 ± 0.328  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Add offset param to micro, reduce number of configurations and variants

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/85b50169..350612d4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=08
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=07-08

  Stats: 13 lines in 1 file changed: 6 ins; 4 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]

2023-02-08 Thread Claes Redestad
On Wed, 8 Feb 2023 01:10:59 GMT, David Schlosnagle  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Minimize, force inline, generalize
>
> src/java.base/share/classes/java/util/Arrays.java line 3594:
> 
>> 3592: public static int[] copyOf(int[] original, int newLength) {
>> 3593: if (newLength == original.length) {
>> 3594: return original.clone();
> 
> I am curious about the use of `clone` for some primitive array types 
> (`short[]`, `int[]`, `long[]`, `char[]`, `float[]`) and `copyOf` using 
> `System.arraycopy` in other types (`byte[]`, `double[]`). Do these types 
> optimize differently or hit different intrinsics depending on primitive type? 
> Is there difference in array zeroing?
> 
> From a quick [JMH 
> benchmark](https://gist.github.com/schlosna/975e26965ec822ad42034b3ea2b08676) 
> `System.arraycopy` seems slightly better.

I went back and forth on this and also saw a small win using `arraycopy`, but 
the PR ended up in an inconsistent state with some using one and some using the 
other. While this discrepancy seem like something we should treat as a bug, 
I've arranged to use `copyOf` helper consistently for now.

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v8]

2023-02-08 Thread Claes Redestad
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
> clone arrays when `newLength` or range inputs span the input array. This 
> helps eliminate range checks and has been verified to help various String 
> operations. Example:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> Patch: 
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 14.666 ± 0.336  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 14.582 ± 0.288  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 20.339 ± 0.328  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Err on the side of copyOf

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/c906c730..85b50169

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=07
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=06-07

  Stats: 90 lines in 1 file changed: 56 ins; 23 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Reduce Arrays.copyOf/-Range overheads [v7]

2023-02-07 Thread Claes Redestad
> This patch adds special-cases to `Arrays.copyOf` and `Arrays.copyOfRange` to 
> clone arrays when `newLength` or range inputs span the input array. This 
> helps eliminate range checks and has been verified to help various String 
> operations. Example:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> Patch: 
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 14.666 ± 0.336  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 14.582 ± 0.288  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 20.339 ± 0.328  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Minimize, force inline, generalize

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/72138930..c906c730

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=06
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=05-06

  Stats: 154 lines in 1 file changed: 120 ins; 23 del; 11 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v4]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 13:23:26 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 20 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into getentrypos-prefixmatch
>  - Fix incorrect offset for the CEN "length of extra field". Fixed spelling 
> of "invalid".
>  - Spelling fix in test data for non-ascii 

Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 19:10:08 GMT, Francesco Nigro  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   copyrights
>
> Thanks @cl4es to look into this!

@franz1981 idea seems to apply nicely here, and going back and applying it to 
`Arrays.copyOfRange` end up on top for the common case where `copyOfRange` 
copies the entire range:


Benchmark(size)  Mode  Cnt   Score  
 Error  Units
StringConstructor.newStringFromArray  7  avgt   15  14.666 
± 0.336  ns/op
StringConstructor.newStringFromArrayWithCharset   7  avgt   15  14.582 
± 0.288  ns/op
StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  20.339 
± 0.328  ns/op


We might still benefit for some cases to specialize a `copyBytes` method, but 
this solution might help more cases. If others agree I might take a step back 
and apply this optimization to all the `copyOfRange` methods and add some 
microbenchmarking to verify this more widely.

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v6]

2023-02-07 Thread Claes Redestad
> This adds a local, specialized `copyBytes` method to `String` that avoids 
> certain redundant range checks and clamping that JIT has issues removing 
> fully.
> 
> This has a small but statistically significant effect on `String` 
> microbenchmarks, eg.:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> 
> Patch:
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 15.477 ± 0.342  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 15.557 ± 0.352  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 21.272 ± 0.398  ns/op
> 
> 
> Care has to be taken to ensure preconditions have been checked when using 
> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
> possible pre-existing issue where the constructor might either throw an 
> exception or truncate the buffer if the builder byte array and length is not 
> in agreement (theoretically possible if you clear/remove and call 
> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
> right thing to do regardless of this RFE.

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - Apply @franz1981's idea but directly to Arrays.copyOfRange, factoring out 
range checks and helping JIT pick the best arraycopy adapter
 - Back out all changes except micro

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/cece6f80..72138930

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=05
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=04-05

  Stats: 62 lines in 4 files changed: 24 ins; 8 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 19:08:59 GMT, Francesco Nigro  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   copyrights
>
> src/java.base/share/classes/java/lang/String.java line 698:
> 
>> 696: }
>> 697: 
>> 698: static byte[] copyBytes(byte[] bytes, int offset, int length) {
> 
> Given that the stub generated for array copy seems highly dependent by the 
> call site constrains, did you tried adding a check for offset == 0 and/or 
> length == bytes.length?
> 
> If (offset == 0 && bytes.length == length) {
> System.arrayCopy(bytes, 0, dst, 0, bytes.length);
> // etc etc the other combinations 
> 
> This should have different generated stubs with much smaller ASM depending by 
> the enforced constrains (and shouldn't affect terribly the code size of the 
> method, given that the stub won't be inlined AFAIK)
> 
> Beware, as noted by others, I'm not suggesting that's the way to fix this, 
> but it would be interesting to check how much perf we leave on the ground due 
> to the this supposed "inefficient" stub generation (if that's the issue).

I did some quick experiments but saw no clear win from doing anything like this 
here. Feel free to experiment and see if there's some particular configuration 
that comes out ahead.

FTR I did not intend for this RFE to solve 
https://bugs.openjdk.org/browse/JDK-8295496 completely, but provide a small, 
partial win that might possibly clear a path to solving that likely orthogonal 
issue.

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 15:25:05 GMT, Claes Redestad  wrote:

>> This adds a local, specialized `copyBytes` method to `String` that avoids 
>> certain redundant range checks and clamping that JIT has issues removing 
>> fully.
>> 
>> This has a small but statistically significant effect on `String` 
>> microbenchmarks, eg.:
>> 
>> Baseline
>> 
>> Benchmark(size)  Mode  Cnt   
>> Score   Error  Units
>> StringConstructor.newStringFromArray  7  avgt   15  
>> 16.817 ± 0.369  ns/op
>> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
>> 16.866 ± 0.449  ns/op
>> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
>> 22.198 ± 0.396  ns/op
>> 
>> 
>> Patch:
>> 
>> Benchmark(size)  Mode  Cnt   
>> Score   Error  Units
>> StringConstructor.newStringFromArray  7  avgt   15  
>> 15.477 ± 0.342  ns/op
>> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
>> 15.557 ± 0.352  ns/op
>> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
>> 21.272 ± 0.398  ns/op
>> 
>> 
>> Care has to be taken to ensure preconditions have been checked when using 
>> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
>> possible pre-existing issue where the constructor might either throw an 
>> exception or truncate the buffer if the builder byte array and length is not 
>> in agreement (theoretically possible if you clear/remove and call 
>> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
>> right thing to do regardless of this RFE.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   copyrights

> 

Rather than splitting right down the middle isn't it more effective to factor 
out code that would typically not be executed, such as the exception creation + 
formatting? That additionally allows the JIT to outline such code altogether, 
allowing more aggressive inlining of the non-exceptional path(s).

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 19:24:11 GMT, Stuart Marks  wrote:

> > It might be that the redundant checks in Arrays.copyOfRange would be 
> > eliminated properly with more inlining, and that the issue here is that the 
> > affected String constructor is particularly gnarly. This gnarliness is due 
> > 1) the need to construct the value and coder in one go and 2) the lack of 
> > multiple return values. Give me a value-based record type so we can split 
> > apart the constructor into charset-specific methods with zero overhead and 
> > we might be able to split up the logic into better-sized chunks.
> 
> I'm wondering if another contributing factor to the complexity of this code 
> is the continued support of the non-compact-String codepaths. This means 
> there are actually three code paths through every string computation. Do we 
> need to continue to support the non-compact-string code paths? I'm concerned 
> about maintainability too.

I think most apps have sufficient ASCII/latin1-encodable data to make compact 
strings a net win. Especially with recent improvements to key intrinsics that 
has narrowed the gap. 

I still think turning off compact strings might be beneficial in locales where 
most strings are UTF-16, but as you say there might be wins to maintainability 
and code complexity by ripping out `-XX:-CompactStrings` (which could mean a 
net performance win for all).

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 18:35:32 GMT, John R Rose  wrote:

> Our source code is a reference implementation, and people will look at this 
> change as evidence that `Arrays::copyOfRange` should be hand-inlined by savvy 
> coders. Surely we could also fix this small performance pothole by improving 
> C2’s treatment of `Arrays.copyOfRange`. That would benefit all users as well, 
> not just `String`. That is our preferred way to handle things.
> 
> On the other hand, `String` is an important class and worthy of every tiny 
> tweak we give it. Do we need this fix now? If so, I suggest putting in a 
> comment in the code which says something like “normally one would use 
> Arrays.copyOfRange here, but we get slightly better code in this particular 
> case”. Also, regarding the bug against the JIT, I suggest that we back out 
> this change to `String` when that JIT bug is fixed. Perhaps the comment in 
> `String` should reference that bug.

It might be that the redundant checks in `Arrays.copyOfRange` would be 
eliminated properly with more inlining, and that the issue here is that the 
affected `String` constructor is particularly gnarly. This gnarliness is due 1) 
the need to construct the `value` and `coder` in one go and 2) the lack of 
multiple return values. Give me a value-based record type so we can split apart 
the constructor into charset-specific methods with zero overhead and we might 
be able to split up the logic into better-sized chunks.

But yes, I will add some commentary to the effect that this should ideally be 
handled by our JIT, along with comments that the method deliberately avoids 
safety checks.

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v5]

2023-02-07 Thread Claes Redestad
> This adds a local, specialized `copyBytes` method to `String` that avoids 
> certain redundant range checks and clamping that JIT has issues removing 
> fully.
> 
> This has a small but statistically significant effect on `String` 
> microbenchmarks, eg.:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> 
> Patch:
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 15.477 ± 0.342  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 15.557 ± 0.352  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 21.272 ± 0.398  ns/op
> 
> 
> Care has to be taken to ensure preconditions have been checked when using 
> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
> possible pre-existing issue where the constructor might either throw an 
> exception or truncate the buffer if the builder byte array and length is not 
> in agreement (theoretically possible if you clear/remove and call 
> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
> right thing to do regardless of this RFE.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  copyrights

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/2e33c27f..cece6f80

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=04
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=03-04

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v3]

2023-02-07 Thread Claes Redestad
On Tue, 7 Feb 2023 14:57:52 GMT, Alan Bateman  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Update StringLatin1.trim for consistency
>
> src/java.base/share/classes/java/lang/String.java line 4546:
> 
>> 4544: // To avoid surprises due to data races (which would either 
>> truncate or throw an exception)
>> 4545: // we should check that length <= val.length up front
>> 4546: checkOffset(length, val.length);
> 
> I agree a check is needed here but I assume using checkOffset means that 
> SB::toString could fail with SIOOBE. I wonder if Math.min(length, val.length) 
> would be better here.

Ok. That keeps behavior consistent for most cases and removes a path where we 
can fail with SIOOBE in the existing code (down `StringUTF16::compress`).

-

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v4]

2023-02-07 Thread Claes Redestad
> This adds a local, specialized `copyBytes` method to `String` that avoids 
> certain redundant range checks and clamping that JIT has issues removing 
> fully.
> 
> This has a small but statistically significant effect on `String` 
> microbenchmarks, eg.:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> 
> Patch:
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 15.477 ± 0.342  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 15.557 ± 0.352  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 21.272 ± 0.398  ns/op
> 
> 
> Care has to be taken to ensure preconditions have been checked when using 
> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
> possible pre-existing issue where the constructor might either throw an 
> exception or truncate the buffer if the builder byte array and length is not 
> in agreement (theoretically possible if you clear/remove and call 
> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
> right thing to do regardless of this RFE.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  truncate rather than throw in String(AbstractStringBuilder), keeping current 
behavior

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/a5e494bc..2e33c27f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=03
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=02-03

  Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v2]

2023-02-07 Thread Claes Redestad
> This adds a local, specialized `copyBytes` method to `String` that avoids 
> certain redundant range checks and clamping that JIT has issues removing 
> fully.
> 
> This has a small but statistically significant effect on `String` 
> microbenchmarks, eg.:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> 
> Patch:
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 15.477 ± 0.342  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 15.557 ± 0.352  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 21.272 ± 0.398  ns/op
> 
> 
> Care has to be taken to ensure preconditions have been checked when using 
> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
> possible pre-existing issue where the constructor might either throw an 
> exception or truncate the buffer if the builder byte array and length is not 
> in agreement (theoretically possible if you clear/remove and call 
> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
> right thing to do regardless of this RFE.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  fix StringUTF16.trim bug, rename locals to reduce confusion

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/2d1f8e37..5b637e09

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=00-01

  Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String [v3]

2023-02-07 Thread Claes Redestad
> This adds a local, specialized `copyBytes` method to `String` that avoids 
> certain redundant range checks and clamping that JIT has issues removing 
> fully.
> 
> This has a small but statistically significant effect on `String` 
> microbenchmarks, eg.:
> 
> Baseline
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 16.817 ± 0.369  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 16.866 ± 0.449  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 22.198 ± 0.396  ns/op
> 
> 
> Patch:
> 
> Benchmark(size)  Mode  Cnt   
> Score   Error  Units
> StringConstructor.newStringFromArray  7  avgt   15  
> 15.477 ± 0.342  ns/op
> StringConstructor.newStringFromArrayWithCharset   7  avgt   15  
> 15.557 ± 0.352  ns/op
> StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  
> 21.272 ± 0.398  ns/op
> 
> 
> Care has to be taken to ensure preconditions have been checked when using 
> `checkBytes`. In the case of `String(AbstractStringBuilder)` there's a 
> possible pre-existing issue where the constructor might either throw an 
> exception or truncate the buffer if the builder byte array and length is not 
> in agreement (theoretically possible if you clear/remove and call 
> `trimToSize()` concurrently). Adding an explicit check here seem to be the 
> right thing to do regardless of this RFE.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Update StringLatin1.trim for consistency

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12453/files
  - new: https://git.openjdk.org/jdk/pull/12453/files/5b637e09..a5e494bc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12453=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=12453=01-02

  Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


RFR: 8301958: Avoid Arrays.copyOfRange overhead in java.lang.String

2023-02-07 Thread Claes Redestad
This adds a local, specialized `copyBytes` method to `String` that avoids 
certain redundant range checks and clamping that JIT has issues removing fully.

This has a small but statistically significant effect on `String` 
microbenchmarks, eg.:

Baseline

Benchmark(size)  Mode  Cnt   Score  
 Error  Units
StringConstructor.newStringFromArray  7  avgt   15  16.817 
± 0.369  ns/op
StringConstructor.newStringFromArrayWithCharset   7  avgt   15  16.866 
± 0.449  ns/op
StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  22.198 
± 0.396  ns/op


Patch:

Benchmark(size)  Mode  Cnt   Score  
 Error  Units
StringConstructor.newStringFromArray  7  avgt   15  15.477 
± 0.342  ns/op
StringConstructor.newStringFromArrayWithCharset   7  avgt   15  15.557 
± 0.352  ns/op
StringConstructor.newStringFromArrayWithCharsetName   7  avgt   15  21.272 
± 0.398  ns/op


Care has to be taken to ensure preconditions have been checked when using 
`checkBytes`. In the case of `String(AbstractStringBuilder)` there's a possible 
pre-existing issue where the constructor might either throw an exception or 
truncate the buffer if the builder byte array and length is not in agreement 
(theoretically possible if you clear/remove and call `trimToSize()` 
concurrently). Adding an explicit check here seem to be the right thing to do 
regardless of this RFE.

-

Commit messages:
 - Include some callsites in StringLatin1/UTF16, rename to copyBytes to 
disambiguate from generic utility methods, tune microbenchmark
 - 8301958: Avoid overhead of Arrays.copyOfRange in String

Changes: https://git.openjdk.org/jdk/pull/12453/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12453=00
  Issue: https://bugs.openjdk.org/browse/JDK-8301958
  Stats: 36 lines in 4 files changed: 14 ins; 2 del; 20 mod
  Patch: https://git.openjdk.org/jdk/pull/12453.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12453/head:pull/12453

PR: https://git.openjdk.org/jdk/pull/12453


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v3]

2023-02-07 Thread Claes Redestad
On Mon, 6 Feb 2023 16:14:14 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix incorrect offset for the CEN "length of extra field". Fixed spelling of 
> "invalid".

Some GHA testing seem to have failed due to infrastructure issues - try merging 
in master to trigger testing again.

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 15:21:10 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Spelling fix in test data for non-ascii latin1 string

I think this looks good. Glad we can get this done without adding even more 
ways to peek into `String` internals.

test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 158:

> 156: int coff = off + CEN_HDR + nlen + elen;
> 157: 
> 158: // Write 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 15:21:10 GMT, Eirik Bjorsnos  wrote:

>> After finding a hash match, getEntryPos needs to compare the lookup name up 
>> to the encoded entry name in the CEN. This comparison is done by decoding 
>> the entry name into a String. The names can then be compared using the 
>> String API. This decoding step adds a significat cost to this method.
>> 
>> This PR suggest to update the string comparison such that in the common case 
>> where  both the lookup name and the entry name are encoded in 
>> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
>> instead be compared direcly. 
>> 
>> ZipCoder is updated with a new method to compare a string with an encoded 
>> byte array range.  The default implementation decodes to string (like the 
>> current code), while the UTF-8 implementation uses 
>> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
>> Arrays.mismatch for comparison with or without matching trailing slashes. 
>> 
>> Additionally, this PR suggest to make the following updates to getEntryPos:
>> 
>> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
>> already checks this and will throws IAE for invalid UTF-8). This seems to 
>> give a 3-4% speedup on micros)
>> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
>> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
>> found no existing test coverage for this)
>> - The recursion when looking for "name/" matches is replaced with iteration. 
>> We keep track of any "name/" match and return it at the end of the search. 
>> (I feel this is easier to follow and it also gives a ~30% speedup for 
>> addSlash lookups with no regression on regular lookups)
>> 
>> (My though is that including these additional updates in this PR might 
>> reduce reviewer overhead given that it touches the exact same code. I might 
>> be wrong on this, please advise :)
>> 
>> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
>> to use xalan.jar):
>> 
>> Baseline:
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
>> ns/op
>> 
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark (size)  Mode  CntScore   Error  
>> Units
>> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
>> ns/op
>> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
>> ns/op
>> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
>> ns/op
>> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
>> ns/op
>> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
>> ns/op
>> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
>> ns/op
>> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
>> ns/op
>> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
>> ns/op
>> 
>> 
>> To assess the impact on startup/warmup, I made a main method class which 
>> measures the total time of calling ZipFile.getEntry for all entries in the 
>> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
>> (time in micros):
>> 
>> 
>> Percentile Baseline Patch
>> 50 %  23155 21149
>> 75 %  23598 21454
>> 90 %  23989 21691
>> 95 %  24238 21973
>> 99 %  25270 22446
>> STDEV   792   549
>> Count   500   500
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Spelling fix in test data for non-ascii latin1 string

test/jdk/java/util/zip/ZipFile/InvalidBytesInEntryNameOrComment.java line 153:

> 151: // Need to skip past the length of the name and extra fields
> 152: int nlen = buffer.getShort(off + NLEN);
> 153: int elen = buffer.getShort(off + NLEN);

Is this supposed to 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos [v2]

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 15:14:37 GMT, Eirik Bjorsnos  wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Spelling fix in test data for non-ascii latin1 string
>
> test/jdk/java/util/zip/ZipFile/TestZipFileEncodings.java line 126:
> 
>> 124: // latin1, but not ASCII
>> 125: String entryName = "smörgårdsbord";
>> 126: 
> 
> @cl4es Are we ok with non-ASCII in source files? I'd hate to escape this ;-)

As long as the file is UTF-8 encoded then I think it should be fine. Thanks for 
fixing the spelling before I could remark on it! :D

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 12:01:19 GMT, Eirik Bjorsnos  wrote:

>> Nice, I have updated the PR such that the new shared secret is replaced with 
>> using getBytesNoRepl instead. If there is a performance difference, it seems 
>> to hide in the noise.
>> 
>> I had expected such a regression to be caught by existing tests, which seems 
>> not to be the case. I added TestZipFileEncodings.latin1NotAscii to adress 
>> this.
>
> getBytesNoRepl throws CharacterCodingException "for malformed input or 
> unmappable characters".
> 
> This should never happen since initCEN should already reject it. If it should 
> happen anyway, I return NO_MATCH which will ignore the match just like the 
> catch in getEntryPos currently does.

Yes, this should be fine.

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos

2023-02-06 Thread Claes Redestad
On Mon, 30 Jan 2023 10:32:38 GMT, Eirik Bjorsnos  wrote:

> After finding a hash match, getEntryPos needs to compare the lookup name up 
> to the encoded entry name in the CEN. This comparison is done by decoding the 
> entry name into a String. The names can then be compared using the String 
> API. This decoding step adds a significat cost to this method.
> 
> This PR suggest to update the string comparison such that in the common case 
> where  both the lookup name and the entry name are encoded in 
> ASCII-compatible UTF-8,  decoding can be avoided and the byte arrays can 
> instead be compared direcly. 
> 
> ZipCoder is updated with a new method to compare a string with an encoded 
> byte array range.  The default implementation decodes to string (like the 
> current code), while the UTF-8 implementation uses 
> JavaLangAccess.getBytesNoRepl to get the  bytes. Both methods thes uses 
> Arrays.mismatch for comparison with or without matching trailing slashes. 
> 
> Additionally, this PR suggest to make the following updates to getEntryPos:
> 
> - The try/catch for IAE is redundant and can be safely removed. (initCEN 
> already checks this and will throws IAE for invalid UTF-8). This seems to 
> give a 3-4% speedup on micros)
> - A new test InvalidBytesInEntryNameOrComment is a added to verify that 
> initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I 
> found no existing test coverage for this)
> - The recursion when looking for "name/" matches is replaced with iteration. 
> We keep track of any "name/" match and return it at the end of the search. (I 
> feel this is easier to follow and it also gives a ~30% speedup for addSlash 
> lookups with no regression on regular lookups)
> 
> (My though is that including these additional updates in this PR might reduce 
> reviewer overhead given that it touches the exact same code. I might be wrong 
> on this, please advise :)
> 
> I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified 
> to use xalan.jar):
> 
> Baseline:
> 
> Benchmark (size)  Mode  CntScore   Error  
> Units
> ZipFileGetEntry.getEntryHit  512  avgt   15   74.941 ± 1.004  
> ns/op
> ZipFileGetEntry.getEntryHit 1024  avgt   15   84.943 ± 1.320  
> ns/op
> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  120.371 ± 2.386  
> ns/op
> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  126.128 ± 1.075  
> ns/op
> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.818 ± 0.838  
> ns/op
> ZipFileGetEntry.getEntryMiss1024  avgt   15   29.762 ± 5.998  
> ns/op
> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   59.405 ± 0.545  
> ns/op
> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   71.840 ± 2.455  
> ns/op
> ZipFileGetEntry.getEntrySlash512  avgt   15  135.621 ± 4.341  
> ns/op
> ZipFileGetEntry.getEntrySlash   1024  avgt   15  134.190 ± 2.141  
> ns/op
> 
> 
> 
> PR:
> 
> 
> Benchmark (size)  Mode  CntScore   Error  
> Units
> ZipFileGetEntry.getEntryHit  512  avgt   15   62.267 ± 1.329  
> ns/op
> ZipFileGetEntry.getEntryHit 1024  avgt   15   72.916 ± 2.428  
> ns/op
> ZipFileGetEntry.getEntryHitUncached  512  avgt   15  101.630 ± 1.154  
> ns/op
> ZipFileGetEntry.getEntryHitUncached 1024  avgt   15  113.161 ± 0.502  
> ns/op
> ZipFileGetEntry.getEntryMiss 512  avgt   15   23.003 ± 1.191  
> ns/op
> ZipFileGetEntry.getEntryMiss1024  avgt   15   23.236 ± 1.114  
> ns/op
> ZipFileGetEntry.getEntryMissUncached 512  avgt   15   56.781 ± 1.505  
> ns/op
> ZipFileGetEntry.getEntryMissUncached1024  avgt   15   67.767 ± 1.963  
> ns/op
> ZipFileGetEntry.getEntrySlash512  avgt   15   73.745 ± 2.717  
> ns/op
> ZipFileGetEntry.getEntrySlash   1024  avgt   15   75.784 ± 1.051  
> ns/op
> 
> 
> To assess the impact on startup/warmup, I made a main method class which 
> measures the total time of calling ZipFile.getEntry for all entries in the 
> 109 JAR file dependenies of spring-petclinic. The shows a nice improvement 
> (time in micros):
> 
> 
> Percentile Baseline Patch
> 50 %  23155 21149
> 75 %  23598 21454
> 90 %  23989 21691
> 95 %  24238 21973
> 99 %  25270 22446
> STDEV   792   549
> Count   500   500

Filed JDK-8301873 for this, update PR title when you're ready.

src/java.base/share/classes/java/lang/System.java line 2668:

> 2666: @Override
> 2667: public int mismatchUTF8(String str, byte[] b, int 
> fromIndex, int toIndex) {
> 2668: byte[] encoded = str.isLatin1() ? str.value() : 
> str.getBytes(UTF_8.INSTANCE);

I think this is incorrect: latin-1 characters above codepoint 127 (non-ascii) 
would be represented by 2 bytes in UTF-8. What you want here is probably 
`str.isAscii() ? ...`. The ASCII check will have to 

Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 14:27:36 GMT, Claes Redestad  wrote:

>> Interesting. Would be nice to solve this in the JIT!
>> 
>> This disabled code got deleted in my last commit, but it seems like you have 
>> a good analysis so we can let it go now.
>
> Right. I might have fumbled this experiment a bit, and perhaps your setup 
> would inline and then eliminate some of the known-in-range checks already. 
> 
> Though we have intrinsified some of the `Preconditions.check*` methods in the 
> past to help improve range checks, but the `checkFromToIndex` method that 
> would be applicable here has not been intrinsified. It might be a reasonable 
> path forward to replace `Arrays.rangeCheck` with 
> `Preconditions.checkFromToIndex` and then look at intrinsifying that method 
> to help eliminating or optimizing some of the checks.

Nevermind, I had a flaw in my experiment and seems the first range check in a 
call like `Arrays.mismatch(encoded, 0, encoded.length, b, off, off+len);` 
should be eliminated. So perhaps you're seeing the cost of the second range 
check, which might be unavoidable to be safe (zip meta data could otherwise be 
doctored to try and perform out of bounds reads via intrinsified code)

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: 8301873: Avoid string decoding in ZipFile.Source.getEntryPos

2023-02-06 Thread Claes Redestad
On Mon, 6 Feb 2023 11:56:22 GMT, Eirik Bjorsnos  wrote:

>> src/java.base/share/classes/java/lang/System.java line 2671:
>> 
>>> 2669: if (false) {
>>> 2670: // Arrays.mismatch without the range checks (~5% 
>>> faster micro getEntryHit)
>>> 2671: int aLength = encoded.length;
>> 
>> Part of the difference you're seeing is due to knowing that you'll be 
>> matching the entire length of the first array (`encoded, 0, encoded.length`).
>> 
>> As an experiment I added `Arrays.mismatch(byte[], byte[], int, int)` to 
>> mismatch the entire range of the first array argument vs the range of the 
>> second and can spot an improvement in affected micros:
>> 
>> Benchmark (size)  Mode  Cnt   Score   
>> Error  Units
>> ArraysMismatch.Char.differentSubrangeMatches  90  avgt   10  12.165 ± 
>> 0.074  ns/op # mismatch(a, aFrom, aTo, b, bFrom, bTo)
>> ArraysMismatch.Char.subrangeMatches   90  avgt   10  10.748 ± 
>> 0.006  ns/op # mismatch(a, b, bFrom, bTo)
>> 
>> This might be something we can solve in the JITs without having to add new 
>> methods to `java.util.Arrays` to deal as efficiently as possible with the 
>> case when we're matching against the entirety of one of the arrays.
>
> Interesting. Would be nice to solve this in the JIT!
> 
> This disabled code got deleted in my last commit, but it seems like you have 
> a good analysis so we can let it go now.

Right. I might have fumbled this experiment a bit, and perhaps your setup would 
inline and then eliminate some of the known-in-range checks already. 

Though we have intrinsified some of the `Preconditions.check*` methods in the 
past to help improve range checks, but the `checkFromToIndex` method that would 
be applicable here has not been intrinsified. It might be a reasonable path 
forward to replace `Arrays.rangeCheck` with `Preconditions.checkFromToIndex` 
and then look at intrinsifying that method to help eliminating or optimizing 
some of the checks.

-

PR: https://git.openjdk.org/jdk/pull/12290


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]

2023-02-05 Thread Claes Redestad
On Thu, 2 Feb 2023 15:33:29 GMT, Scott Gibbons  wrote:

>> Names are important, but always hard to get right. At the very least they 
>> need to be correct. Maybe call it something like 
>> `..parameterized_decode_tables..` and the other `..shared_decode_tables..`?
>
> I prefer leaving them the way they are.  I don't think the names, along with 
> the associated comments within the tables, causes any undue confusion as to 
> their function.  However I will implement the name change if that's all it 
> takes to procure a review approval.  Please provide the specific names you'd 
> like me to use and I'll change them.  Or just approve as-is :-).

I meant no disrespect here. By your own words 
`base64_AVX2_decode_URL_tables_addr` is "essentially incorrect", so I suggested 
some alternatives that I thought would make the code slightly more 
approachable. Regardless of whether you fix this detail I am not ready to 
approve this PR since I would need time to digest the latest changes. I'll ask 
a more senior engineer to review and give final approval (these changes need 2 
reviewers approval anyhow).

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]

2023-02-01 Thread Claes Redestad
On Wed, 1 Feb 2023 20:59:24 GMT, Scott Gibbons  wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2202:
>> 
>>> 2200: }
>>> 2201: 
>>> 2202: address StubGenerator::base64_AVX2_decode_URL_tables_addr() {
>> 
>> Shouldn't this be `decode_lut_tables`? As it's used for URL and non-URL 
>> decoding alike.
>
> These tables are used for both URL and non-URL based on the parameter, and 
> they are only two of the three lut tables used (the other is in 
> `base64_AVX2_decode_tables_addr` ).  Both names are essentially incorrect.  
> Does the name really matter that much?  It's the same as 
> `base64_AVX2_decode_tables_addr` with the addition of URL tables.

Names are important, but always hard to get right. At the very least they need 
to be correct. Maybe call it something like `..parameterized_decode_tables..` 
and the other `..shared_decode_tables..`?

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v7]

2023-02-01 Thread Claes Redestad
On Wed, 1 Feb 2023 18:28:25 GMT, Scott Gibbons  wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
>> Units
>> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt  Score 
>> Error   Units
>> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt  ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Handle AVX2 URL; address review comments

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2202:

> 2200: }
> 2201: 
> 2202: address StubGenerator::base64_AVX2_decode_URL_tables_addr() {

Shouldn't this be `decode_lut_tables`? As it's used for URL and non-URL 
decoding alike.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v6]

2023-01-27 Thread Claes Redestad
On Fri, 27 Jan 2023 21:36:29 GMT, Claes Redestad  wrote:

>> Scott Gibbons has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 13 additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'openjdk:master' into Base64-AVX2
>>  - Merge branch 'openjdk:master' into Base64-AVX2
>>  - Merge branch 'openjdk:master' into Base64-AVX2
>>  - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into 
>> Base64-AVX2
>>  - Merge branch 'openjdk:master' into Base64-AVX2
>>  - Address review comment
>>  - Remove whitespace
>>  - Fix wrong register usage
>>  - Working version of Base64 decode with AVX2 (4x perf improvement). No URL 
>> support
>>  - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into 
>> Base64-AVX2
>>  - ... and 3 more: https://git.openjdk.org/jdk/compare/50dc1196...3e66f7be
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2643:
> 
>> 2641: // Handle isURL / MIME?!?!  r12 is used for length calculation 
>> (from out)
>> 2642: //
>> 2643: // rbx is out, r12 is saved out, rdx is size, rsi is src
> 
> It seems that on windows `r12` is in use, see line 2323. GHA seem to be 
> having some trouble finishing Windows testing on time - could there be some 
> issue here?

Nevermind, you're not using r12 and GHA Windows testing is green now.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v6]

2023-01-27 Thread Claes Redestad
On Fri, 27 Jan 2023 18:31:50 GMT, Scott Gibbons  wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
>> Units
>> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark  (maxNumBytes)   Mode  Cnt  Score 
>> Error   Units
>> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt  ScoreError   Units
>> Base64Decode.testBase64Decode   144   4   1024  
>> thrpt3  14738.051 ± 24.383  ops/ms
>
> Scott Gibbons has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 13 additional 
> commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into Base64-AVX2
>  - Merge branch 'openjdk:master' into Base64-AVX2
>  - Merge branch 'openjdk:master' into Base64-AVX2
>  - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into 
> Base64-AVX2
>  - Merge branch 'openjdk:master' into Base64-AVX2
>  - Address review comment
>  - Remove whitespace
>  - Fix wrong register usage
>  - Working version of Base64 decode with AVX2 (4x perf improvement). No URL 
> support
>  - Merge branch 'Base64-AVX2' of https://github.com/asgibbons/jdk into 
> Base64-AVX2
>  - ... and 3 more: https://git.openjdk.org/jdk/compare/50dc1196...3e66f7be

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2159:

> 2157: }
> 2158: 
> 2159: address StubGenerator::base64_AVX2_tables_addr() {

A casual reader might wonder why there's 3 other avx2-tables and then this, so 
for readability it'd be nice to add "decode" to the name of this new table.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2643:

> 2641: // Handle isURL / MIME?!?!  r12 is used for length calculation 
> (from out)
> 2642: //
> 2643: // rbx is out, r12 is saved out, rdx is size, rsi is src

It seems that on windows `r12` is in use, see line 2323. GHA seem to be having 
some trouble finishing Windows testing on time - could there be some issue here?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2647:

> 2645: 
> 2646: 
> 2647: // *** NEEDS TO BE FIXED 

While it's fine to leave implementation of `getMimeDecoder` and `getUrlDecoder` 
for a follow-up, I think these comments needs to be cleaned up. E.g. a 
follow-up RFE filed and referenced.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2651:

> 2649: __ jcc(Assembler::notZero, L_tailProc);
> 2650: 
> 2651: __ cmpl(length, 44);

Perform `length` checks first to avoid unnecessary branches on small inputs?

Ideal might be to move this length check up just before the `_cmpl(length, 
128)` in the AVX-512 block, so that if `AVX=3` short inputs branch directly to 
the scalar tail procedure without jumping around. This might also apply to the 
encode stub, though that's pre-existing.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2673:

> 2671: __ vmovdqu(xmm13, Address(r13, 0xc8));
> 2672: __ vmovdqu(xmm12, Address(r13, 0x08));
> 2673: __ jmp(L_enterLoop);

This got me curious (sorry!) and maybe there's something going on here that I'm 
not getting... But why have you split the loop apart and jump into the middle 
of it? It'd seem not doing this would be more straightforward, with no 
difference semantically and one less jump? (align32 should just translate to a 
narrow nop instruction, if anything)

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2

2023-01-23 Thread Claes Redestad
On Mon, 23 Jan 2023 18:14:16 GMT, Scott Gibbons  wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661:
>> 
>>> 2659: __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit);
>>> 2660: __ vmovdqu(xmm11, Address(r13, 0x28));
>>> 2661: __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit);
>> 
>> Sorry in advance since I'm probably reading this wrong: the data that `r13` 
>> is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does 
>> this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up 
>> their respective registers with the exact same bits? If so, and since 
>> neither of them is mutated in the code below, then perhaps this can be 
>> simplified a bit.
>
> You're reading it correctly - this is redundant and could be handled 
> differently, as the same value is being loaded into ymm4 and ymm10.  I don't 
> think there will be any significant performance gain either way.  This was 
> done in this manner to allow easier transition to URL acceleration when it is 
> implemented, as URLs require handling '-' and '_' instead of '+' and '/' ('/' 
> = 0x2f).

I was mainly curious if there was some obscure detail or difference that eluded 
me. It wouldn't be the first time!

As it's outside of the loop you're probably right about it not being very 
important to overall performance, though we should be mindful of setup 
overheads of transitioning into AVX code. Especially since inputs likely are 
skewed towards the smallest applicable lengths. I think it would be prudent to 
run and check the microbenchmark with values near the AVX2 threshold, such as 
`maxNumBytes = 48`. 

If you choose to keep the code as-is would you mind documenting the rationale 
behind the redundancy? (Is there WIP on more generalized URL acceleration that 
could be referenced?)

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2

2023-01-23 Thread Claes Redestad
On Sat, 21 Jan 2023 00:15:10 GMT, Scott Gibbons  wrote:

> Added code for Base64 acceleration (encode and decode) which will accelerate 
> ~4x for AVX2 platforms.
> 
> Encode performance:
> **Old:**
> 
> Benchmark  (maxNumBytes)   Mode  Cnt Score   Error   
> Units
> Base64Encode.testBase64Encode   1024  thrpt3  4309.439 ± 2.632  
> ops/ms
> 
> 
> **New:**
> 
> Benchmark  (maxNumBytes)   Mode  Cnt  Score Error 
>   Units
> Base64Encode.testBase64Encode   1024  thrpt3  24211.397 ± 102.026 
>  ops/ms
> 
> 
> Decode performance:
> **Old:**
> 
> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   Mode 
>  Cnt ScoreError   Units
> Base64Decode.testBase64Decode   144   4   1024  thrpt 
>3  3961.768 ± 93.409  ops/ms
> 
> **New:**
> Benchmark  (errorIndex)  (lineSize)  (maxNumBytes)   Mode 
>  Cnt  ScoreError   Units
> Base64Decode.testBase64Decode   144   4   1024  thrpt 
>3  14738.051 ± 24.383  ops/ms

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661:

> 2659: __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit);
> 2660: __ vmovdqu(xmm11, Address(r13, 0x28));
> 2661: __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit);

Sorry in advance since I'm probably reading this wrong: the data that `r13` is 
pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does this 
mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up their 
respective registers with the exact same bits? If so, and since neither of them 
is mutated in the code below, then perhaps this can be simplified a bit.

-

PR: https://git.openjdk.org/jdk/pull/12126


Re: RFR: 8300818: Reduce complexity of padding with DateTimeFormatter [v2]

2023-01-23 Thread Claes Redestad
On Sun, 22 Jan 2023 13:28:06 GMT, Sergey Tsypanov  wrote:

>> src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java 
>> line 2611:
>> 
>>> 2609: "Cannot print as output of " + len + " characters 
>>> exceeds pad width of " + padWidth);
>>> 2610: }
>>> 2611: buf.insert(preLen, 
>>> String.valueOf(padChar).repeat(padWidth - len));
>> 
>> Have you checked with a microbenchmark that this added allocation can be 
>> elided by JITs and that there's a significant speed-up with your changes? I 
>> don't have the necessary domain expertise to assert anything here but I 
>> suspect padding widths are typically short. Such as 2 or 4 (for day/month 
>> and year fields) so a micro should examine there's no regression for little 
>> to no padding. Unlike the original code you call `insert` even if `padWidth 
>> - len == 0`. This might be optimized away by JITs, but it'd be good to 
>> verify which is best.
>
> The modified code is called only when a user explicitly calls `padNext(int, 
> char)`, i.e. if I modified the example snippet as
> 
> DateTimeFormatter dtf = new DateTimeFormatterBuilder()
>   .appendLiteral("Date:")
> //.padNext(20, ' ')
>   .append(DateTimeFormatter.ISO_DATE)
>   .toFormatter();
> String text = dtf.format(LocalDateTime.now());
> 
> there's no regression.
> 
> Right now I cannot build ad-hoc JDK with my changes and check it with JMH, so 
> I'd convert this to draft until I can verify it

Meant that you should verify that something like this, which just add a little 
padding, doesn't regress with your changes: 

DateTimeFormatter dtf = new DateTimeFormatterBuilder()
.appendLiteral("Year:")
.padNext(5)
.appendValue(ChronoField.YEAR)
.toFormatter();
... 
dtf.format(LocalDateTime.now());

And similar for effectively no padding (`.padNext(4)` in the above example). As 
this API might often be used to ensure short 2-4 char fields are correctly 
padded I think it's important that we're not adding overhead to such use cases.

-

PR: https://git.openjdk.org/jdk/pull/12131


Re: RFR: 8300818: Reduce complexity of padding with DateTimeFormatter [v2]

2023-01-22 Thread Claes Redestad
On Sun, 22 Jan 2023 09:50:21 GMT, Sergey Tsypanov  wrote:

>> Currently it's O(n) - we do `n` shifts of bytes within `StringBuilder`. This 
>> can be reduced to O(1) improving the code like:
>> 
>> DateTimeFormatter dtf = new DateTimeFormatterBuilder()
>>   .appendLiteral("Date:")
>>   .padNext(20, ' ')
>>   .append(DateTimeFormatter.ISO_DATE)
>>   .toFormatter();
>> String text = dtf.format(LocalDateTime.now());
>
> Sergey Tsypanov has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into dtfb
>  - Improve padding of DateTimeFormatter

Changes requested by redestad (Reviewer).

src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java line 
2603:

> 2601: public boolean format(DateTimePrintContext context, 
> StringBuilder buf) {
> 2602: int preLen = buf.length();
> 2603: if (!printerParser.format(context, buf)) {

Non-standard as it may be, this style of using `expr == false` instead of 
`!expr` is a style choice by the original author. I would advice against 
changing these piecemeal without discussion and agreement that we should 
consistently enforce the `!expr` style.

src/java.base/share/classes/java/time/format/DateTimeFormatterBuilder.java line 
2611:

> 2609: "Cannot print as output of " + len + " characters 
> exceeds pad width of " + padWidth);
> 2610: }
> 2611: buf.insert(preLen, String.valueOf(padChar).repeat(padWidth 
> - len));

Have you checked with a microbenchmark that this added allocation can be elided 
by JITs and that there's a significant speed-up with your changes? I don't have 
the necessary domain expertise to assert anything here but I suspect padding 
widths are typically short. Such as 2 or 4 (for day/month and year fields) so a 
micro should examine there's no regression for little to no padding. Unlike the 
original code you call `insert` even if `padWidth - len == 0`. This might be 
optimized away by JITs, but it'd be good to verify which is best.

-

PR: https://git.openjdk.org/jdk/pull/12131


Integrated: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-21 Thread Claes Redestad
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

This pull request has now been integrated.

Changeset: bb42e61a
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/bb42e61a6176a7f4f9485efa47a248b23b09a16d
Stats: 120 lines in 4 files changed: 103 ins; 8 del; 9 mod

8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

Reviewed-by: alanb, lancea

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]

2023-01-21 Thread Claes Redestad
On Fri, 20 Jan 2023 16:42:47 GMT, Lance Andersen  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Vary dir and entry name lengths across a wider spread, keeping most 
>> entries short but making the longest paths longer
>
> Thank you Claes

Thanks @LanceAndersen and @AlanBateman for reviewing!

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

Updated micro to vary entry sizes more to ensure we exercise the different code 
paths through the `hashCode` intrinsic. The new setup generates both longer and 
shorter entries than before, weighting up the average length a bit by 
increasing the spread. The longer entries see a proportionately larger 
speed-up, as expected since they benefit from vectorization. Removed some 
pointless randomness.

Baseline:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   98832.801 ± 2155.928  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  187373.545 ± 2296.779  ns/op


Patched:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   85574.648 ±  920.477  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  160493.277 ± 3450.928  ns/op

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]

2023-01-20 Thread Claes Redestad
> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile    1024  avgt   15  147892.522 ± 2744.017  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Vary dir and entry name lengths across a wider spread, keeping most entries 
short but making the longest paths longer

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12077/files
  - new: https://git.openjdk.org/jdk/pull/12077/files/0c7f0f4f..369537c5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12077=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=12077=00-01

  Stats: 9 lines in 1 file changed: 5 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12077.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12077/head:pull/12077

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Fri, 20 Jan 2023 11:25:22 GMT, Lance Andersen  wrote:

>> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates 
>> on a `byte[]` subrange. It can profitably use the recently introduced 
>> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
>> translates to a small but significant speed-up on `ZipFile` creation.
>> 
>> Before:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
>> 
>> After:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op
>
> test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java line 69:
> 
>> 67: 
>> 68: ename += "long-entry-name-"  + (random.nextInt(9) + 
>> 1)  + "-" + i;
>> 69: zos.putNextEntry(new ZipEntry(ename));
> 
> Would it be worth to add some random sized data when the entries are created 
> to allow for getting a bit more insight, or  perhaps do that in a separate 
> benchmark>

I was experimenting with varying the entry names length to see what - if any - 
impact it had and saw a small effect on the micro. It does make more sense to 
vary lengths now that very long names will take different paths in the 
vectorized intrinsic. I'll see what I can do without overengineering this.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Fri, 20 Jan 2023 11:10:13 GMT, Alan Bateman  wrote:

> > FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the 
> > same directory. It's not exactly necessary for this use case to add such a 
> > benchmark, but I think there's value in verifying that optimizing 
> > `checkedHash` improves `ZipFile` setup and adding the micro might allow us 
> > to find further opportunities down the line.
> 
> I've no doubt it improves checkedHash, it's just that open the zip file and 
> reading in the CEN probably dominates when not in the file system cache.

Right, the micro is a poor proxy for real-world implications since time to open 
a zip file very much depends on the filesystem speed but this is sort of by 
design. We have separate startup tests that tries to emulate more "cold start" 
scenarios, which micros like this are complementary to and not a substitute for.

-

PR: https://git.openjdk.org/jdk/pull/12077


Integrated: 8300647: Miscellaneous hashCode improvements in java.base

2023-01-19 Thread Claes Redestad
On Thu, 19 Jan 2023 11:45:12 GMT, Claes Redestad  wrote:

> Went through the jdk and found a few more places where 
> `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc 
> methods could be replaced with a plain call to `java.util.Arrays` 
> equivalents. This patch addresses that.
> 
> After this, #12068, and #12077 I think we're reaching the limit of sensible 
> direct use of `AS::vectorizedHashCode`. I've found a few places outside of 
> java.base where `vectorizedHashCode` might be applicable, but none that seem 
> important enough to warrant an export of this method outside of java.base.

This pull request has now been integrated.

Changeset: d85243f0
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/d85243f02b34d03bd7af63a5bcbc73f500f720df
Stats: 37 lines in 2 files changed: 1 ins; 31 del; 5 mod

8300647: Miscellaneous hashCode improvements in java.base

Reviewed-by: stsypanov, rriggs

-

PR: https://git.openjdk.org/jdk/pull/12093


Re: RFR: 8300647: Miscellaneous hashCode improvements in java.base [v2]

2023-01-19 Thread Claes Redestad
On Thu, 19 Jan 2023 13:46:26 GMT, Claes Redestad  wrote:

>> Went through the jdk and found a few more places where 
>> `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc 
>> methods could be replaced with a plain call to `java.util.Arrays` 
>> equivalents. This patch addresses that.
>> 
>> After this, #12068, and #12077 I think we're reaching the limit of sensible 
>> direct use of `AS::vectorizedHashCode`. I've found a few places outside of 
>> java.base where `vectorizedHashCode` might be applicable, but none that seem 
>> important enough to warrant an export of this method outside of java.base.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Revert ZipFileSystem changes since it's additionally built as a library 
> dependency and can't rely on jdk.internal

Thanks for reviewing!

-

PR: https://git.openjdk.org/jdk/pull/12093


Re: RFR: 8300647: Miscellaneous hashCode improvements in java.base [v2]

2023-01-19 Thread Claes Redestad
> Went through the jdk and found a few more places where 
> `ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc 
> methods could be replaced with a plain call to `java.util.Arrays` 
> equivalents. This patch addresses that.
> 
> After this, #12068, and #12077 I think we're reaching the limit of sensible 
> direct use of `AS::vectorizedHashCode`. I've found a few places outside of 
> java.base where `vectorizedHashCode` might be applicable, but none that seem 
> important enough to warrant an export of this method outside of java.base.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Revert ZipFileSystem changes since it's additionally built as a library 
dependency and can't rely on jdk.internal

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12093/files
  - new: https://git.openjdk.org/jdk/pull/12093/files/5f36ebd7..b1507e69

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12093=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=12093=00-01

  Stats: 7 lines in 1 file changed: 2 ins; 2 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12093.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12093/head:pull/12093

PR: https://git.openjdk.org/jdk/pull/12093


RFR: 8300647: Miscellaneous hashCode improvements in java.base

2023-01-19 Thread Claes Redestad
Went through the jdk and found a few more places where 
`ArraysSupport::vectorizedHashCode` can be used, and a few where adhoc methods 
could be replaced with a plain call to `java.util.Arrays` equivalents. This 
patch addresses that.

After this, #12068, and #12077 I think we're reaching the limit of sensible 
direct use of `AS::vectorizedHashCode`. I've found a few places outside of 
java.base where `vectorizedHashCode` might be applicable, but none that seem 
important enough to warrant an export of this method outside of java.base.

-

Commit messages:
 - Miscellaneous hashCode improvements in java.base

Changes: https://git.openjdk.org/jdk/pull/12093/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12093=00
  Issue: https://bugs.openjdk.org/browse/JDK-8300647
  Stats: 44 lines in 3 files changed: 3 ins; 33 del; 8 mod
  Patch: https://git.openjdk.org/jdk/pull/12093.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12093/head:pull/12093

PR: https://git.openjdk.org/jdk/pull/12093


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-19 Thread Claes Redestad
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the same 
directory. It's not exactly necessary for this use case to add such a 
benchmark, but I think there's value in verifying that optimizing `checkedHash` 
improves `ZipFile` setup and adding the micro might allow us to find further 
opportunities down the line.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName

2023-01-18 Thread Claes Redestad
On Wed, 18 Jan 2023 11:04:57 GMT, Claes Redestad  wrote:

> This patch makes use of `ArraysSupport.vectorizedHashCode` and helps verify 
> the performance win also for range-based hash calculations. Also modernized 
> and cleaned up the surrounding code somewhat. 
> 
> 
> Benchmark  Mode  CntScore   Error  Units
> Characters.CodePoints.codePointOf  avgt   15  153.753 ± 6.119  ns/op
> Characters.CodePoints.codePointOf  avgt   15  134.753 ± 4.386  ns/op # after, 
> 1.14x
> 
> 
> For C1 and C2 on non-x86 platforms the performance is neutral. Interpreter 
> sees a 1-2% regression due a few added calls.

Thanks for reviewing!

-

PR: https://git.openjdk.org/jdk/pull/12068


Integrated: 8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName

2023-01-18 Thread Claes Redestad
On Wed, 18 Jan 2023 11:04:57 GMT, Claes Redestad  wrote:

> This patch makes use of `ArraysSupport.vectorizedHashCode` and helps verify 
> the performance win also for range-based hash calculations. Also modernized 
> and cleaned up the surrounding code somewhat. 
> 
> 
> Benchmark  Mode  CntScore   Error  Units
> Characters.CodePoints.codePointOf  avgt   15  153.753 ± 6.119  ns/op
> Characters.CodePoints.codePointOf  avgt   15  134.753 ± 4.386  ns/op # after, 
> 1.14x
> 
> 
> For C1 and C2 on non-x86 platforms the performance is neutral. Interpreter 
> sees a 1-2% regression due a few added calls.

This pull request has now been integrated.

Changeset: 3ea0b8bc
Author:Claes Redestad 
URL:   
https://git.openjdk.org/jdk/commit/3ea0b8bc253f1498895a879681406ecc15f37afb
Stats: 30 lines in 2 files changed: 15 ins; 5 del; 10 mod

8300489: Use ArraysSupport.vectorizedHashCode in j.l.CharacterName

Reviewed-by: alanb, naoto

-

PR: https://git.openjdk.org/jdk/pull/12068


<    1   2   3   4   5   6   7   8   >