Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]
On Tue, 9 Apr 2024 23:45:39 GMT, Scott Gibbons wrote: > Is there any way to disable some of the optimizations C2 will attempt on the > IR? We need to maintain atomicity, so vectorization shouldn't occur, for > instance. This seems like a rat-hole that would need constant maintenance as > C2 optimizations get better. Sorry, I do not know that (I'm not a C2 engineer :-) ). One small observation: how important is atomicity in the "full off-heap case" ? E.g. if a `setMemory` is occurring at a location that is provably off-heap (and we should have ways to detect that, we do that also for other unsafe memory access routines), then perhaps the atomicity requirement can go (as I suppose that requirement is there for the Java Memory Model) ? So, perhaps, while we might not be able to fully optimize for on-heap access, we might be able to do so for off-heap access (which is an important case for FFM). - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2047288498
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]
On Sun, 7 Apr 2024 05:14:08 GMT, Francesco Nigro wrote: >> I went ahead and tried a pure-Java implementation, and it is faster for >> small sizes (up to 8) and only about 1.5x slower for larger sizes, so that >> might make for an interesting fallback if there is no customized assembler >> implementation available or if the size is known to me small. >> >> Ideally, I think we would want C2 to be more aware of setMemory stores, so >> that it can remove redundant stores, like it does with InitializeNode. > > @dean-long in my old PR I have done the same, choosing a (not yet) > configurable cutoff value. > > See https://github.com/openjdk/jdk/pull/16760 As an experiment I added the java code that @franz1981 supplied and ran performance vs. the intrinsic stub. I used 128 bytes as the cutoff value as in that code. I saw about 0.75 to 1ns improvement for sizes of 1 or 2 bytes only. Anything larger and the stub performed better. @mcimadamore Is there any way to disable some of the optimizations C2 will attempt on the IR? We need to maintain atomicity, so vectorization shouldn't occur, for instance. This seems like a rat-hole that would need constant maintenance as C2 optimizations get better. - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2046208254
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]
On Sun, 7 Apr 2024 01:49:01 GMT, Dean Long wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Oops > > I went ahead and tried a pure-Java implementation, and it is faster for small > sizes (up to 8) and only about 1.5x slower for larger sizes, so that might > make for an interesting fallback if there is no customized assembler > implementation available or if the size is known to me small. > > Ideally, I think we would want C2 to be more aware of setMemory stores, so > that it can remove redundant stores, like it does with InitializeNode. @dean-long in my old PR I have done the same, choosing a (not yet) configurable cutoff value. See https://github.com/openjdk/jdk/pull/16760 - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2041314429
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]
On Sat, 6 Apr 2024 00:13:26 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall performance of >> `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and >> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Oops I went ahead and tried a pure-Java implementation, and it is faster for small sizes (up to 8) and only about 1.5x slower for larger sizes, so that might make for an interesting fallback if there is no customized assembler implementation available or if the size is known to me small. Ideally, I think we would want C2 to be more aware of setMemory stores, so that it can remove redundant stores, like it does with InitializeNode. - PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2041271472
Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]
> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See > [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around > this change. > > Overall, making this an intrinsic improves overall performance of > `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after > numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Oops - Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/b025318f..fd6f04f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18555=05 - incr: https://webrevs.openjdk.org/?repo=jdk=18555=04-05 Stats: 18 lines in 2 files changed: 2 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555