Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-10 Thread Maurizio Cimadamore
On Tue, 9 Apr 2024 23:45:39 GMT, Scott Gibbons  wrote:

> Is there any way to disable some of the optimizations C2 will attempt on the 
> IR? We need to maintain atomicity, so vectorization shouldn't occur, for 
> instance. This seems like a rat-hole that would need constant maintenance as 
> C2 optimizations get better.

Sorry, I do not know that (I'm not a C2 engineer :-) ). One small observation: 
how important is atomicity in the "full off-heap case" ? E.g. if a `setMemory` 
is occurring at a location that is provably off-heap (and we should have ways 
to detect that, we do that also for other unsafe memory access routines), then 
perhaps the atomicity requirement can go (as I suppose that requirement is 
there for the Java Memory Model) ?

So, perhaps, while we might not be able to fully optimize for on-heap access, 
we might be able to do so for off-heap access (which is an important case for 
FFM).

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2047288498


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-09 Thread Scott Gibbons
On Sun, 7 Apr 2024 05:14:08 GMT, Francesco Nigro  wrote:

>> I went ahead and tried a pure-Java implementation, and it is faster for 
>> small sizes (up to 8) and only about 1.5x slower for larger sizes, so that 
>> might make for an interesting fallback if there is no customized assembler 
>> implementation available or if the size is known to me small.
>> 
>> Ideally, I think we would want C2 to be more aware of setMemory stores, so 
>> that it can remove redundant stores, like it does with InitializeNode.
>
> @dean-long in my old PR I have done the same, choosing a (not yet) 
> configurable cutoff value. 
> 
> See https://github.com/openjdk/jdk/pull/16760

As an experiment I added the java code that @franz1981 supplied and ran 
performance vs. the intrinsic stub.  I used 128 bytes as the cutoff value as in 
that code.  I saw about 0.75 to 1ns improvement for sizes of 1 or 2 bytes only. 
 Anything larger and the stub performed better.

@mcimadamore Is there any way to disable some of the optimizations C2 will 
attempt on the IR?  We need to maintain atomicity, so vectorization shouldn't 
occur, for instance.  This seems like a rat-hole that would need constant 
maintenance as C2 optimizations get better.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2046208254


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-06 Thread Francesco Nigro
On Sun, 7 Apr 2024 01:49:01 GMT, Dean Long  wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Oops
>
> I went ahead and tried a pure-Java implementation, and it is faster for small 
> sizes (up to 8) and only about 1.5x slower for larger sizes, so that might 
> make for an interesting fallback if there is no customized assembler 
> implementation available or if the size is known to me small.
> 
> Ideally, I think we would want C2 to be more aware of setMemory stores, so 
> that it can remove redundant stores, like it does with InitializeNode.

@dean-long in my old PR I have done the same, choosing a (not yet) configurable 
cutoff value. 

See https://github.com/openjdk/jdk/pull/16760

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2041314429


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-06 Thread Dean Long
On Sat, 6 Apr 2024 00:13:26 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Oops

I went ahead and tried a pure-Java implementation, and it is faster for small 
sizes (up to 8) and only about 1.5x slower for larger sizes, so that might make 
for an interesting fallback if there is no customized assembler implementation 
available or if the size is known to me small.

Ideally, I think we would want C2 to be more aware of setMemory stores, so that 
it can remove redundant stores, like it does with InitializeNode.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2041271472


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-05 Thread Scott Gibbons
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
> this change.
> 
> Overall, making this an intrinsic improves overall performance of 
> `Unsafe::setMemory` by up to 4x for all buffer sizes.
> 
> Tested with tier-1 (and full CI).  I've added a table of the before and after 
> numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
> 
> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)

Scott Gibbons has updated the pull request incrementally with one additional 
commit since the last revision:

  Oops

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18555/files
  - new: https://git.openjdk.org/jdk/pull/18555/files/b025318f..fd6f04f7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18555=05
 - incr: https://webrevs.openjdk.org/?repo=jdk=18555=04-05

  Stats: 18 lines in 2 files changed: 2 ins; 0 del; 16 mod
  Patch: https://git.openjdk.org/jdk/pull/18555.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555

PR: https://git.openjdk.org/jdk/pull/18555