RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-18 Thread Claes Redestad
`ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on a 
`byte[]` subrange. It can profitably use the recently introduced 
`ArraysSupport::vectorizedHashCode` method to see a speed-up, which translates 
to a small but significant speed-up on `ZipFile` creation.

Before:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op

After:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

-

Commit messages:
 - Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

Changes: https://git.openjdk.org/jdk/pull/12077/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12077&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8300493
  Stats: 116 lines in 4 files changed: 99 ins; 8 del; 9 mod
  Patch: https://git.openjdk.org/jdk/pull/12077.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12077/head:pull/12077

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-18 Thread Alan Bateman
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

Using countPositives, and vectorizedHashCode(T_BOOLEAN) for unsigned bytes, 
make sense here. I don't have time to study the micro right now.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-19 Thread Claes Redestad
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the same 
directory. It's not exactly necessary for this use case to add such a 
benchmark, but I think there's value in verifying that optimizing `checkedHash` 
improves `ZipFile` setup and adding the micro might allow us to find further 
opportunities down the line.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Alan Bateman
On Thu, 19 Jan 2023 09:46:12 GMT, Claes Redestad  wrote:

> FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the 
> same directory. It's not exactly necessary for this use case to add such a 
> benchmark, but I think there's value in verifying that optimizing 
> `checkedHash` improves `ZipFile` setup and adding the micro might allow us to 
> find further opportunities down the line.

I've no doubt it improves checkedHash, it's just that open the zip file and 
reading in the CEN probably dominates when not in the file system cache.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Alan Bateman
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

Marked as reviewed by alanb (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Lance Andersen
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java line 69:

> 67: 
> 68: ename += "long-entry-name-"  + (random.nextInt(9) + 
> 1)  + "-" + i;
> 69: zos.putNextEntry(new ZipEntry(ename));

Would it be worth to add some random sized data when the entries are created to 
allow for getting a bit more insight, or  perhaps do that in a separate 
benchmark>

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Fri, 20 Jan 2023 11:10:13 GMT, Alan Bateman  wrote:

> > FWIW the micro is derived from the sibling `ZipFileGetEntry` micro in the 
> > same directory. It's not exactly necessary for this use case to add such a 
> > benchmark, but I think there's value in verifying that optimizing 
> > `checkedHash` improves `ZipFile` setup and adding the micro might allow us 
> > to find further opportunities down the line.
> 
> I've no doubt it improves checkedHash, it's just that open the zip file and 
> reading in the CEN probably dominates when not in the file system cache.

Right, the micro is a poor proxy for real-world implications since time to open 
a zip file very much depends on the filesystem speed but this is sort of by 
design. We have separate startup tests that tries to emulate more "cold start" 
scenarios, which micros like this are complementary to and not a substitute for.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Fri, 20 Jan 2023 11:25:22 GMT, Lance Andersen  wrote:

>> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates 
>> on a `byte[]` subrange. It can profitably use the recently introduced 
>> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
>> translates to a small but significant speed-up on `ZipFile` creation.
>> 
>> Before:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
>> 
>> After:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op
>
> test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java line 69:
> 
>> 67: 
>> 68: ename += "long-entry-name-"  + (random.nextInt(9) + 
>> 1)  + "-" + i;
>> 69: zos.putNextEntry(new ZipEntry(ename));
> 
> Would it be worth to add some random sized data when the entries are created 
> to allow for getting a bit more insight, or  perhaps do that in a separate 
> benchmark>

I was experimenting with varying the entry names length to see what - if any - 
impact it had and saw a small effect on the micro. It does make more sense to 
vary lengths now that very long names will take different paths in the 
vectorized intrinsic. I'll see what I can do without overengineering this.

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder

2023-01-20 Thread Claes Redestad
On Wed, 18 Jan 2023 16:53:04 GMT, Claes Redestad  wrote:

> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

Updated micro to vary entry sizes more to ensure we exercise the different code 
paths through the `hashCode` intrinsic. The new setup generates both longer and 
shorter entries than before, weighting up the average length a bit by 
increasing the spread. The longer entries see a proportionately larger 
speed-up, as expected since they benefit from vectorization. Removed some 
pointless randomness.

Baseline:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   98832.801 ± 2155.928  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  187373.545 ± 2296.779  ns/op


Patched:

Benchmark (size)  Mode  Cnt   Score  Error  Units
ZipFileOpen.openCloseZipFile 512  avgt   15   85574.648 ±  920.477  ns/op
ZipFileOpen.openCloseZipFile1024  avgt   15  160493.277 ± 3450.928  ns/op

-

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]

2023-01-20 Thread Claes Redestad
> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates on 
> a `byte[]` subrange. It can profitably use the recently introduced 
> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
> translates to a small but significant speed-up on `ZipFile` creation.
> 
> Before:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
> 
> After:
> 
> Benchmark (size)  Mode  Cnt   Score  Error  Units
> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Vary dir and entry name lengths across a wider spread, keeping most entries 
short but making the longest paths longer

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12077/files
  - new: https://git.openjdk.org/jdk/pull/12077/files/0c7f0f4f..369537c5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12077&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12077&range=00-01

  Stats: 9 lines in 1 file changed: 5 ins; 1 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12077.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12077/head:pull/12077

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]

2023-01-20 Thread Lance Andersen
On Fri, 20 Jan 2023 12:19:50 GMT, Claes Redestad  wrote:

>> `ZipCoder::checkedHashCode` emulates `StringLatin1::hashCode` but operates 
>> on a `byte[]` subrange. It can profitably use the recently introduced 
>> `ArraysSupport::vectorizedHashCode` method to see a speed-up, which 
>> translates to a small but significant speed-up on `ZipFile` creation.
>> 
>> Before:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   83007.325 ± 1446.716  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  154550.631 ± 2166.673  ns/op
>> 
>> After:
>> 
>> Benchmark (size)  Mode  Cnt   Score  Error  Units
>> ZipFileOpen.openCloseZipFile 512  avgt   15   79512.902 ±  814.449  ns/op
>> ZipFileOpen.openCloseZipFile1024  avgt   15  147892.522 ± 2744.017  ns/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Vary dir and entry name lengths across a wider spread, keeping most entries 
> short but making the longest paths longer

Thank you Claes

-

Marked as reviewed by lancea (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12077


Re: RFR: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder [v2]

2023-01-21 Thread Claes Redestad
On Fri, 20 Jan 2023 16:42:47 GMT, Lance Andersen  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Vary dir and entry name lengths across a wider spread, keeping most 
>> entries short but making the longest paths longer
>
> Thank you Claes

Thanks @LanceAndersen and @AlanBateman for reviewing!

-

PR: https://git.openjdk.org/jdk/pull/12077