Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6]

2024-05-22 Thread Dean Long
On Wed, 22 May 2024 14:28:41 GMT, Yudi Zheng  wrote:

>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693:
>> 
>>> 4691: const Register xlen  = r1;
>>> 4692: const Register z = r2;
>>> 4693: const Register zlen  = r3;
>> 
>> LibraryCallKit::inline_squareToLen() is still computing zlen and passing it 
>> as the 4th arg, even though the value is unused.
>
> ppc x86 are not using `multiply_to_len` for `generate_squareToLen`. I think 
> we still need to pass zlen for these platforms.

OK.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1610580527


Re: RFR: 8330171: Lazy W^X switch implementation

2024-05-12 Thread Dean Long
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I think there is a sweet-spot middle-ground between the two extremes: 
full-lazy, ideal for performance, and fine-grained execute-by-default, ideal 
for security.  I don't think we should change to full-lazy and remove all the 
guard rails at this time.  I am investigating execute-by-default, and it looks 
promising.

-

Changes requested by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18762#pullrequestreview-2051465621


Re: RFR: 8331087: Move immutable nmethod data from CodeCache [v2]

2024-04-29 Thread Dean Long
On Sun, 28 Apr 2024 23:37:22 GMT, Vladimir Kozlov  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations`. It amounts for about 30% (optimized VM) of space in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Call `vm_exit_out_of_memory()` if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Address comments. Moved jvmci_data back to mutable data section.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18984#pullrequestreview-2027786061


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-29 Thread Dean Long
On Sun, 28 Apr 2024 07:02:40 GMT, Dean Long  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations`. It amounts for about 30% (optimized VM) of space in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Call `vm_exit_out_of_memory()` if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> It only makes sense if the immutable data heap is not also used for other 
> critical resources.  If malloc or metaspace were used as the immutable data 
> heap, normally failures in those heaps are fatal, because other critical 
> resources (monitors, classes, etc) are allocated from there, so any failure 
> means the JVM is about to die.  There's no reason to find a fall-back method 
> to allocate a new nmethod in that case.

> Just to be clear @dean-long , you're saying failure to allocate immutable 
> data in the C heap should result in a fatal error? Makes sense to me as the 
> VM must indeed be very close to crashing anyway in that case. It also, 
> obviates the need for propagating `out_of_memory_error` to JVMCI code.

I hadn't thought it through that far, actually.  I was only pointing out that 
the proposed fall-back:

> increase nmethod size and put immutable data there (as before).

isn't worth the trouble.  But making the C heap failure fatal immediately is 
reasonable, especially if it simplifies JVMCI error reporting.

-

PR Comment: https://git.openjdk.org/jdk/pull/18984#issuecomment-2082083104


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-28 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

It only makes sense if the immutable data heap is not also used for other 
critical resources.  If malloc or metaspace were used as the immutable data 
heap, normally failures in those heaps are fatal, because other critical 
resources (monitors, classes, etc) are allocated from there, so any failure 
means the JVM is about to die.  There's no reason to find a fall-back method to 
allocate a new nmethod in that case.

-

PR Comment: https://git.openjdk.org/jdk/pull/18984#issuecomment-2081364009


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/NMethod.java line 528:

> 526:   private int getScopesDataOffset()   { return (int) 
> scopesDataOffsetField  .getValue(addr); }
> 527:   private int getScopesPCsOffset(){ return (int) 
> scopesPCsOffsetField   .getValue(addr); }
> 528:   private int getDependenciesOffset() { return (int) 0; }

Suggestion:


No longer used.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581671710


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/hotspot/share/code/nmethod.cpp line 1484:

> 1482:   // Calculate positive offset as distance between the start of 
> stubs section
> 1483:   // (which is also the end of instructions section) and the start 
> of the handler.
> 1484:   CHECKED_CAST(_unwind_handler_offset, int16_t, (_stub_offset - 
> code_offset() - offsets->value(CodeOffsets::UnwindHandler)));

Suggestion:

  int unwind_handler_offset = code_offset() + 
offsets->value(CodeOffsets::UnwindHandler);
  CHECKED_CAST(_unwind_handler_offset, int16_t, _stub_offset - 
unwind_handler_offset);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581644356


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/hotspot/share/code/nmethod.cpp line 1332:

> 1330: #if INCLUDE_JVMCI
> 1331: _speculations_offset = _scopes_data_offset;
> 1332: _jvmci_data_offset   = _speculations_offset;

Why not use 0 for all these?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581642931


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:36:50 GMT, Vladimir Kozlov  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
>> in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Bail out compilation if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> src/hotspot/share/code/nmethod.cpp line 117:
> 
>> 115:   result = static_cast(thing); \
>> 116:   assert(static_cast(result) == thing, "failed: %d != %d", 
>> static_cast(result), thing);
>> 117: 
> 
> I replaced `checked_cast<>()` with this macro because of next issues:
>  - The existing assert points to `utilities/checkedCast.hpp` file where this 
> method is located and not where failed cast. It does not help when it is used 
> several times in one method (for example, in `nmethod()` constructors).
>  - The existing assert does not print values

I thought @kimbarrett had a draft PR to address the error reporting issue, but 
I can't seem to find it.  To solve the general problem, I think we need a 
version of vmassert() that takes `char* file, int lineno` as arguments, and a 
macro wrapper for checked_cast() that passes `__FILE__` and `__LINEN__` from 
the caller.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581628786


Re: RFR: 8329433: Reduce nmethod header size [v8]

2024-04-17 Thread Dean Long
On Thu, 18 Apr 2024 00:41:03 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Address comment

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18768#pullrequestreview-2007632424


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

@dougxc should check JVMCI changes.

-

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18819#pullrequestreview-2007498087


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

Did you consider minimizing changes by leaving 
decode_invokedynamic_index/encode_invokedynamic_index calls in place, but 
having the implementations not change the value?

-

PR Comment: https://git.openjdk.org/jdk/pull/18819#issuecomment-2062609288


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

src/hotspot/share/ci/ciEnv.cpp line 1513:

> 1511: // process the BSM
> 1512: int pool_index = indy_info->constant_pool_index();
> 1513: BootstrapInfo bootstrap_specifier(cp, pool_index, indy_index);

Why not just change the incoming parameter name to `index`?

src/hotspot/share/classfile/resolutionErrors.hpp line 60:

> 58: 
> 59:   // This function is used to encode an invokedynamic index to 
> differentiate it from a
> 60:   // constant pool index.  It assumes it is being called with a index 
> that is less than 0

Is this comment still correct?

src/hotspot/share/interpreter/bootstrapInfo.cpp line 77:

> 75: return true;
> 76:   } else if (indy_entry->resolution_failed()) {
> 77: int encoded_index = 
> ResolutionErrorTable::encode_indy_index(_indy_index);

That's an improvement, from two levels of encoding to only one!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569625123
PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569626519
PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569627219


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-17 Thread Dean Long
On Tue, 16 Apr 2024 16:09:21 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.cpp line 1441:
>> 
>>> 1439: int deps_size = align_up((int)dependencies->size_in_bytes(), 
>>> oopSize);
>>> 1440: int sum_size  = oops_size + metadata_size + deps_size;
>>> 1441: assert((sum_size >> 16) == 0, "data size is bigger than 64Kb: 
>>> %d", sum_size);
>> 
>> I suggest using checked_cast for the assignment below, rather than 
>> special-purpose checks here.
>
> Okay. But I will put above code under `#ifdef ASSERT` then.

The ASSERT block above looks unnecessary, now that field assignments below are 
using checked_cast.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1569496753


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

I think you'll want to ask port maintainers for aarch64/arm/ppc/riscv/s390 to 
review and test those changes.
There may be some opportunities for minor improvements, but those could be done 
later.  For example, we are computing `zlen` for the squareToLen stub even 
though the value is unused.  And both x86 and aarch64 seem to have unneeded 
save/restore code, even though I think all these registers are killed when 
called by a C2 runtime stub.

-

PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2062138149


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6662:

> 6660:   push(tmp5);
> 6661: 
> 6662:   push(xlen);

There may be an opportunity here (separate RFE?) to get rid of the save/restore 
for these.  I don't think it's necessary if this is called as part of a C2 stub.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569452818


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 19:45:02 GMT, Dean Long  wrote:

>> Yudi Zheng has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   address comment.
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702:
> 
>> 4700: const Register tmp5  = r15;
>> 4701: const Register tmp6  = r16;
>> 4702: const Register tmp7  = r17;
> 
> No need for r17 or sorting tmps.  Make tmp0 r3, or r6, r7, etc.

Also, I don't see why the code below saves and restores r4/r5.  Maybe 
@theRealAph knows?  Aren't all these registers killed across a runtime call?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569427241


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693:

> 4691: const Register xlen  = r1;
> 4692: const Register z = r2;
> 4693: const Register zlen  = r3;

LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as 
the 4th arg, even though the value is unused.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702:

> 4700: const Register tmp5  = r15;
> 4701: const Register tmp6  = r16;
> 4702: const Register tmp7  = r17;

No need for r17 or sorting tmps.  Make tmp0 r3, or r6, r7, etc.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569419199
PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569420732


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670:

> 4668: const Register tmp5  = r15;
> 4669: const Register tmp6  = r16;
> 4670: const Register tmp7  = r17;

Why not minimize changes and continue to use r5 for tmp0?  I see no need for 
r17 or to reassign all the other tmp registers.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569401544


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-16 Thread Dean Long
On Tue, 16 Apr 2024 03:12:48 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.hpp line 282:
>> 
>>> 280:   _has_flushed_dependencies:1, // Used for maintenance of 
>>> dependencies (under CodeCache_lock)
>>> 281:   _is_unlinked:1,  // mark during class unloading
>>> 282:   _load_reported:1;// used by jvmti to track if an 
>>> event has been posted for this nmethod
>> 
>> It seems like the type could be changed from uint8_t to bool.
>
> Is there difference in generated code when you use bool instead of uint8_t?
> I used uint8_t to easy change to uint16_t in a future if needed.

I don't think uint8_t vs uint16_t matters, only if it is signed, unsigned, or 
bool.  So if we have more than 8 individual :1 fields, it will expand to a 2nd 
byte.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566814258


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-16 Thread Dean Long
On Tue, 16 Apr 2024 03:06:13 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.hpp line 205:
>> 
>>> 203:   // offsets to find the receiver for non-static native wrapper 
>>> frames.
>>> 204:   ByteSize _native_receiver_sp_offset;
>>> 205:   ByteSize _native_basic_lock_sp_offset;
>> 
>> Don't we need an assert in the accessor functions to make sure nmethod is 
>> native or not?
>
> I thought about that but in both places where these accessors are called 
> (`frame::get_native_monitor()` and `frame::get_native_receiver()`) there are 
> such asserts already:
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/frame.cpp#L1085

OK, but I'd rather see it in the accessors too.  Some users are checking for 
method()->is_native() and others are checking for is_osr_method(), so we need 
to make sure those are always mutually exclusive: method()->is_native() != 
is_osr_method().

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566806754


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-16 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/memory/heap.hpp line 58:

> 56:   void set_length(size_t length) {
> 57: LP64_ONLY( assert(((length >> 32) == 0), "sanity"); )
> 58: _header._length = (uint32_t)length;

Suggestion:

_header._length = checked_castlength;

src/hotspot/share/memory/heap.hpp line 63:

> 61:   // Accessors
> 62:   void* allocated_space() const  { return (void*)(this + 
> 1); }
> 63:   size_t length() const  { return 
> (size_t)_header._length; }

This cast looks unnecessary.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566784458
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566784587


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-16 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.cpp line 1441:

> 1439: int deps_size = align_up((int)dependencies->size_in_bytes(), 
> oopSize);
> 1440: int sum_size  = oops_size + metadata_size + deps_size;
> 1441: assert((sum_size >> 16) == 0, "data size is bigger than 64Kb: %d", 
> sum_size);

I suggest using checked_cast for the assignment below, rather than 
special-purpose checks here.

src/hotspot/share/code/nmethod.cpp line 1445:

> 1443: _metadata_offset  = (uint16_t)oops_size;
> 1444: _dependencies_offset  = _metadata_offset  + 
> (uint16_t)metadata_size;
> 1445: _scopes_pcs_offset= _dependencies_offset  + (uint16_t)deps_size;

Use checked_cast instead of raw casts.

src/hotspot/share/code/nmethod.cpp line 1459:

> 1457: assert((data_offset() + data_end_offset) <= nmethod_size, "wrong 
> nmethod's size: %d < %d", nmethod_size, (data_offset() + data_end_offset));
> 1458: 
> 1459: _entry_offset  = 
> (uint16_t)offsets->value(CodeOffsets::Entry);

Use checked_cast.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566771026
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566772567
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566773477


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-16 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.cpp line 1235:

> 1233: int skipped_insts_size   = 
> code_buffer->total_skipped_instructions_size();
> 1234: #ifdef ASSERT
> 1235: assert(((skipped_insts_size >> 16) == 0), "size is bigger than 
> 64Kb: %d", skipped_insts_size);

Suggestion:


I think it's simpler just to use checked_cast below.

src/hotspot/share/code/nmethod.cpp line 1240:

> 1238: int consts_offset = 
> code_buffer->total_offset_of(code_buffer->consts());
> 1239: assert(consts_offset == 0, "const_offset: %d", consts_offset);
> 1240: #endif

Suggestion:

src/hotspot/share/code/nmethod.cpp line 1241:

> 1239: assert(consts_offset == 0, "const_offset: %d", consts_offset);
> 1240: #endif
> 1241: _skipped_instructions_size = (uint16_t)skipped_insts_size;

Suggestion:

_skipped_instructions_size = 
checked_cast(code_buffer->total_skipped_instructions_size());

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566764300
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566765068
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566759786


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.hpp line 205:

> 203:   // offsets to find the receiver for non-static native wrapper 
> frames.
> 204:   ByteSize _native_receiver_sp_offset;
> 205:   ByteSize _native_basic_lock_sp_offset;

Don't we need an assert in the accessor functions to make sure nmethod is 
native or not?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566634384


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.hpp line 282:

> 280:   _has_flushed_dependencies:1, // Used for maintenance of 
> dependencies (under CodeCache_lock)
> 281:   _is_unlinked:1,  // mark during class unloading
> 282:   _load_reported:1;// used by jvmti to track if an 
> event has been posted for this nmethod

It seems like the type could be changed from uint8_t to bool.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566631312


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/codeBlob.cpp line 120:

> 118: 
> 119:   _header_size((uint16_t)header_size),
> 120:   _frame_complete_offset((int16_t)CodeOffsets::frame_never_safe),

See above.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566604310


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/codeBlob.cpp line 88:

> 86:   S390_ONLY(_ctable_offset(0) COMMA)
> 87:   _header_size((uint16_t)header_size),
> 88:   _frame_complete_offset((int16_t)frame_complete_offset),

Rather than a raw cast, it would be better to use checked_cast here, or better 
yet, change the incoming parameter types to match the field type.  That way, if 
the caller is passing a constant, the compiler can check it at compile time.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566601934


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v7]

2024-04-08 Thread Dean Long
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add movq to locate_operand

Thanks, I see that my ideas have pretty much already been discussed in 
https://github.com/openjdk/jdk/pull/16760.  I might have missed it, but has the 
possibility of always setting the aligned interior region with 8 byte stores 
been discussed?  A literal reading of the javadoc seems to disallow it, but it 
seems like it should be allowed based on memory coherence.  Only the unaligned 
head and tail would need special treatment.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2043732829


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v6]

2024-04-06 Thread Dean Long
On Sat, 6 Apr 2024 00:13:26 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Oops

I went ahead and tried a pure-Java implementation, and it is faster for small 
sizes (up to 8) and only about 1.5x slower for larger sizes, so that might make 
for an interesting fallback if there is no customized assembler implementation 
available or if the size is known to me small.

Ideally, I think we would want C2 to be more aware of setMemory stores, so that 
it can remove redundant stores, like it does with InitializeNode.

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2041271472


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v4]

2024-04-04 Thread Dean Long
On Wed, 3 Apr 2024 15:15:24 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix Windows

As an experiment, couldn't you have the C2 intrinsic redirect to a Java helper 
that calls putByte() in a loop?

-

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2038994043


Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v4]

2024-04-04 Thread Dean Long
On Wed, 3 Apr 2024 15:15:24 GMT, Scott Gibbons  wrote:

>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64.  See 
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around 
>> this change.
>> 
>> Overall, making this an intrinsic improves overall performance of 
>> `Unsafe::setMemory` by up to 4x for all buffer sizes.
>> 
>> Tested with tier-1 (and full CI).  I've added a table of the before and 
>> after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`).
>> 
>> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt)
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix Windows

I think the right approach is to turn it into a loop in the IR, which I think 
is what Doug was implying.  That way C2 can do all its usual optimizations, 
like unrolling, vectorization, and redundant store elimination (if it is an 
on-heap primitive array that was just allocated, then there is no need to zero 
the parts that are being "set").

-

Changes requested by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-1981533209


Re: RFR: 8329332: Remove CompiledMethod and CodeBlobLayout classes

2024-04-01 Thread Dean Long
On Fri, 29 Mar 2024 19:35:45 GMT, Vladimir Kozlov  wrote:

> Revert [JDK-8152664](https://bugs.openjdk.org/browse/JDK-8152664) RFE 
> [changes](https://github.com/openjdk/jdk/commit/b853eb7f5ca24eeeda18acbb14287f706499c365)
>  which was used for AOT [JEP 295](https://openjdk.org/jeps/295) 
> implementation in JDK 9. The code was left in HotSpot assuming it will help 
> in a future. But during work on Leyden we decided to not use it. In Leyden 
> cached compiled code will be restored in CodeCache as normal nmethods: no 
> need to change VM's runtime and GC code to process them.
> 
> I may work on optimizing `CodeBlob` and `nmethod` fields layout to reduce 
> header size in separate changes. In these changes I did simple fields 
> reordering to keep small (1 byte) fields together.
> 
> I do not see (and not expected) performance difference with these changes.
> 
> Tested tier1-5, xcomp, stress. Running performance testing.
> 
> I need help with testing on platforms which Oracle does not support.

The `not_used` state was introduced for AOT.  It can go away now.

-

PR Comment: https://git.openjdk.org/jdk/pull/18554#issuecomment-2030282409


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Dean Long
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670:

> 4668: const Register tmp6  = r15;
> 4669: const Register tmp7  = r16;
> 4670: const Register tmp8  = r17;

It looks like tmp8 is never used.  The call to multiply_to_len() below needs to 
be adjusted.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530986811


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Dean Long
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3559:

> 3557:  Register tmp5, Register tmp6, 
> Register product_hi, Register tmp8) {
> 3558: 
> 3559:   assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, 
> tmp4, tmp5, tmp6, tmp8);

Suggestion:

  assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, tmp4, tmp5, 
tmp6, tmp8, product_hi);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530980566


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic

2024-03-18 Thread Dean Long
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng  wrote:

> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

src/hotspot/share/opto/library_call.cpp line 5934:

> 5932: // 'y_start' points to y array + scaled ylen
> 5933: 
> 5934: Node* zlen = _gvn.transform(new AddINode(xlen, ylen));

Would could generate one less instruction in the code cache if we did this 
`add` in the native runtime function.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1529102070


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock [v2]

2024-03-06 Thread Dean Long
On Thu, 7 Mar 2024 01:38:56 GMT, Coleen Phillimore  wrote:

>> This change creates a new sort of native recursive lock that can be held 
>> during JNI and Java calls, which can be used for synchronization while 
>> creating objArrayKlasses at runtime.
>> 
>> Passes tier1-7.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Dean's comments.

> Does a PlatformMutex handle priority-inheritance?

It would depend on the OS and the mutex impementation.  You can ignore my 
comment.  It was from long ago trying to put Java on top of a real-time OS.

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1982207873


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/17739#pullrequestreview-1921100707


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

src/hotspot/share/oops/instanceKlass.cpp line 2765:

> 2763: // To get a consistent list of classes we need MultiArray_lock to 
> ensure
> 2764: // array classes aren't observed while they are being restored.
> 2765: MutexLocker ml(MultiArray_lock);

Doesn't this revert JDK-8274338?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515330292


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

src/hotspot/share/oops/arrayKlass.cpp line 135:

> 133: ResourceMark rm(THREAD);
> 134: {
> 135:   // Ensure atomic creation of higher dimensions

Isn't this comment still useful?

src/hotspot/share/oops/arrayKlass.cpp line 144:

> 142: ObjArrayKlass* ak =
> 143:   ObjArrayKlass::allocate_objArray_klass(class_loader_data(), 
> dim + 1, this, CHECK_NULL);
> 144: ak->set_lower_dimension(this);

Would it be useful to assert that the lower dimension has been set?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515322667
PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515323404


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Wed, 6 Mar 2024 23:47:01 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/runtime/mutex.cpp line 537:
>> 
>>> 535: // can be called by jvmti by VMThread.
>>> 536: if (current->is_Java_thread()) {
>>> 537:   _sem.wait_with_safepoint_check(JavaThread::cast(current));
>> 
>> Why not use PlatformMutex + OSThreadWaitState instead of a semaphore?
>
> Semaphore seemed simpler (?)

OK.  It's a good thing HotSpot doesn't need to worry about priority-inheritance 
for mutexes.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515314037


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

OK, that makes sense about loom and JOM.

src/hotspot/share/runtime/mutex.cpp line 537:

> 535: // can be called by jvmti by VMThread.
> 536: if (current->is_Java_thread()) {
> 537:   _sem.wait_with_safepoint_check(JavaThread::cast(current));

Why not use PlatformMutex + OSThreadWaitState instead of a semaphore?

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1982008253
PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515269443


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-05 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

Is the caller always a JavaThread?  I'm wondering if your new recursive lock 
class could use the existing ObjectMonitor.  I thought David asked the same 
question, but I can't find it.

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1979796523


Re: RFR: 8326524: Rename agent_common.h

2024-02-23 Thread Dean Long
On Thu, 22 Feb 2024 19:38:26 GMT, Kim Barrett  wrote:

> Please review this trivial change that renames the file
> test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_common/agent_common.h to
> agent_common.hpp.
> 
> The #include updates were performed mechanically, and builds would fail if
> there were mistakes.
> 
> The copyright updates were similarly performed mechanically. All but a handful
> of the including files have already had their copyrights updated recently,
> likely as part of JDK-8324681. The only change to the renamed file is a
> copyright update, since no code changes were required.
> 
> Testing: mach5 tier1

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/17970#pullrequestreview-1898767675


Re: RFR: 8326524: Rename agent_common.h

2024-02-22 Thread Dean Long
On Thu, 22 Feb 2024 19:38:26 GMT, Kim Barrett  wrote:

> Please review this trivial change that renames the file
> test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_common/agent_common.h to
> agent_common.hpp.
> 
> The #include updates were performed mechanically, and builds would fail if
> there were mistakes.
> 
> The copyright updates were similarly performed mechanically. All but a handful
> of the including files have already had their copyrights updated recently,
> likely as part of JDK-8324681. The only change to the renamed file is a
> copyright update, since no code changes were required.
> 
> Testing: mach5 tier1

If we wanted to minimize changes, we could have agent_common.h include 
agent_common.hpp.  Then we don't have to change all the .cpp files, which have 
other problems, like the includes not being sorted.

-

PR Comment: https://git.openjdk.org/jdk/pull/17970#issuecomment-1960638558


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4]

2024-01-25 Thread Dean Long
On Wed, 24 Jan 2024 14:48:52 GMT, Volker Simonis  wrote:

>> Currently we don't record dependencies on redefined methods (i.e. 
>> `evol_method` dependencies) in JIT compiled methods if none of the 
>> `can_redefine_classes`, `can_retransform_classes` or 
>> `can_generate_breakpoint_events` JVMTI capabalities is set. This means that 
>> if a JVMTI agent which requests one of these capabilities is dynamically 
>> attached, all the methods which have been JIT compiled until that point, 
>> will be marked for deoptimization and flushed from the code cache. For 
>> large, warmed-up applications this mean deoptimization and instant 
>> recompilation of thousands if not then-thousands of methods, which can lead 
>> to dramatic performance/latency drop-downs for several minutes.
>> 
>> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 
>> 451: Prepare to Disallow the Dynamic Loading of 
>> Agents](https://openjdk.org/jeps/451)) and this problem could be solved by 
>> making the recording of `evol_method` dependencies dependent on the new 
>> `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI 
>> capabilities (because the presence of the flag indicates that an agent will 
>> be loaded eventually).
>> 
>> But there a single, however important exception to this rule and that's JFR. 
>> JFR is advertised as low overhead profiler which can be enabled in 
>> production at any time. However, when JFR is started dynamically (e.g. 
>> through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent 
>> which requests the `can_retransform_classes` and retransforms some classes. 
>> This will inevitably trigger the deoptimization of all compiled methods as 
>> described above.
>> 
>> I'd therefor like to propose to *always* and unconditionally record 
>> `evol_method` dependencies in JIT compiled code by exporting the relevant 
>> properties right at startup in `init_globals()`:
>> ```c++
>>  jint init_globals() {
>>management_init();
>>JvmtiExport::initialize_oop_storage();
>> +#if INCLUDE_JVMTI
>> +  JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>> +  JvmtiExport::set_all_dependencies_are_recorded(true);
>> +#endif
>> 
>> 
>> My measurements indicate that the overhead of doing so is minimal (around 1% 
>> increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic 
>> application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions 
>> -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 
>> with C2) resulting in an aggregated nmethod size of around ~40bm. 
>> Additionally recording `evol_method` dependencies only increases this size 
>> be about 400kb
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Made the flag experimental and added an assertion to 
> set_can_hotswap_or_post_breakpoint()

Marked as reviewed by dlong (Reviewer).

No, go ahead.

-

PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1844939013
PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1911240540


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2]

2024-01-23 Thread Dean Long
On Tue, 23 Jan 2024 16:37:35 GMT, Volker Simonis  wrote:

>> src/hotspot/share/runtime/init.cpp line 121:
>> 
>>> 119:   if (AlwaysRecordEvolDependencies) {
>>> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>>> 121: JvmtiExport::set_all_dependencies_are_recorded(true);
>> 
>> I think the check for AlwaysRecordEvolDependencies needs to be moved into 
>> set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, 
>> otherwise don't we risk the value being accidentally reset to false when 
>> set_can_hotswap_or_post_breakpoint() is called again by 
>> JvmtiManageCapabilities::update()?
>
> A good question, but after deep digging (it took me quite some time to figure 
> this out myself :) I don't think that 
> `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` can ever be 
> reset. Here's how it works:
> 
> - there's a global set of `always_capabilities` which is initialized when the 
> first JVMTI environment is created:
> 
> // capabilities which are always potentially available
> jvmtiCapabilities JvmtiManageCapabilities::always_capabilities;
> 
> void JvmtiManageCapabilities::initialize() {
>   _capabilities_lock = new Mutex(Mutex::nosafepoint, "Capabilities_lock");
>   always_capabilities = init_always_capabilities();
> 
> 
> - as the name implies, this set of capabilities contains all generally 
> available capabilities (except for the `onload` and `solo` capabilites, which 
> are maintained in separate sets).
> - `JvmtiManageCapabilities::update()` only operates on this global sets of 
> capabilites (and *not* on the concrete capabilities of a specific JVMTI 
> environment):
> 
> void JvmtiManageCapabilities::update() {
>   jvmtiCapabilities avail;
>   // all capabilities
>   either(_capabilities, _solo_capabilities, );
> ...
>   // If can_redefine_classes is enabled in the onload phase then we know that 
> the
>   // dependency information recorded by the compiler is complete.
>   if ((avail.can_redefine_classes || avail.can_retransform_classes) &&
>   JvmtiEnv::get_phase() == JVMTI_PHASE_ONLOAD) {
> JvmtiExport::set_all_dependencies_are_recorded(true);
>   }
> ...
>   JvmtiExport::set_can_hotswap_or_post_breakpoint(
> avail.can_generate_breakpoint_events ||
> avail.can_redefine_classes ||
> avail.can_retransform_classes);
> 
> - This means that `JvmtiManageCapabilities::update()` is always conservative 
> regarding its exports to `JvmtiExport` as it always exports *all* potentially 
> available capabilites once a JVMTI environment is created and not only the 
> specific capabilities requested by concrete JVMTI environment. This means 
> that even if we create a JVMTI agent with an empty set of capabilities, 
> `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` will still 
> be set in `JvmtiExport`.
> - There's no code path (at least I couldn't find one) which takes 
> capabilities away from the `always_capabilities` set. Only if an environment 
> requests `on_load` capabilities, they will be transferred from the global 
> `onload_capabilities` to the `always_capabilities` set:
> 
> jvmtiError JvmtiManageCapabilities::add_capabilities(const jvmtiCapabilities 
> *current,
>  ...

An assert works for me.  Thanks.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1464120039


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2]

2024-01-22 Thread Dean Long
On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis  wrote:

>> Currently we don't record dependencies on redefined methods (i.e. 
>> `evol_method` dependencies) in JIT compiled methods if none of the 
>> `can_redefine_classes`, `can_retransform_classes` or 
>> `can_generate_breakpoint_events` JVMTI capabalities is set. This means that 
>> if a JVMTI agent which requests one of these capabilities is dynamically 
>> attached, all the methods which have been JIT compiled until that point, 
>> will be marked for deoptimization and flushed from the code cache. For 
>> large, warmed-up applications this mean deoptimization and instant 
>> recompilation of thousands if not then-thousands of methods, which can lead 
>> to dramatic performance/latency drop-downs for several minutes.
>> 
>> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 
>> 451: Prepare to Disallow the Dynamic Loading of 
>> Agents](https://openjdk.org/jeps/451)) and this problem could be solved by 
>> making the recording of `evol_method` dependencies dependent on the new 
>> `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI 
>> capabilities (because the presence of the flag indicates that an agent will 
>> be loaded eventually).
>> 
>> But there a single, however important exception to this rule and that's JFR. 
>> JFR is advertised as low overhead profiler which can be enabled in 
>> production at any time. However, when JFR is started dynamically (e.g. 
>> through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent 
>> which requests the `can_retransform_classes` and retransforms some classes. 
>> This will inevitably trigger the deoptimization of all compiled methods as 
>> described above.
>> 
>> I'd therefor like to propose to *always* and unconditionally record 
>> `evol_method` dependencies in JIT compiled code by exporting the relevant 
>> properties right at startup in `init_globals()`:
>> ```c++
>>  jint init_globals() {
>>management_init();
>>JvmtiExport::initialize_oop_storage();
>> +#if INCLUDE_JVMTI
>> +  JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>> +  JvmtiExport::set_all_dependencies_are_recorded(true);
>> +#endif
>> 
>> 
>> My measurements indicate that the overhead of doing so is minimal (around 1% 
>> increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic 
>> application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions 
>> -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 
>> with C2) resulting in an aggregated nmethod size of around ~40bm. 
>> Additionally recording `evol_method` dependencies only increases this size 
>> be about 400kb
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Guard the feature with a diagnostic option and update the comments in the 
> code

src/hotspot/share/runtime/init.cpp line 121:

> 119:   if (AlwaysRecordEvolDependencies) {
> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
> 121: JvmtiExport::set_all_dependencies_are_recorded(true);

I think the check for AlwaysRecordEvolDependencies needs to be moved into 
set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, 
otherwise don't we risk the value being accidentally reset to false when 
set_can_hotswap_or_post_breakpoint() is called again by 
JvmtiManageCapabilities::update()?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462685351


Re: RFR: 8314029: Add file name parameter to Compiler.perfmap [v8]

2023-12-13 Thread Dean Long
On Mon, 11 Dec 2023 22:41:56 GMT, Yi-Fan Tsai  wrote:

>> `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: 
>> `/tmp/perf-%d.map`. This change adds an optional argument for specifying a 
>> file name.
>> 
>> `jcmd PID help Compiler.perfmap` shows the following usage.
>> 
>> 
>> Compiler.perfmap
>> Write map file for Linux perf tool.
>> 
>> Impact: Low
>> 
>> Syntax : Compiler.perfmap  []
>> 
>> Arguments:
>> filename : [optional] Name of the map file (STRING, no default value)
>> 
>> 
>> The following section of man page is also updated. (`man -l 
>> src/jdk.jcmd/share/man/jcmd.1`)
>> 
>> 
>>Compiler.perfmap [arguments] (Linux only)
>>   Write map file for Linux perf tool.
>> 
>>   Impact: Low
>> 
>>   arguments:
>> 
>>   · filename: (Optional) Name of the map file (STRING, no 
>> default value)
>> 
>>   If filename is not specified, a default file name is chosen 
>> using the pid of the target JVM process.  For example, if the pid is 12345,  
>> then
>>   the default filename will be /tmp/perf-12345.map.
>
> Yi-Fan Tsai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update copyright of PerfMapTest

The man page says "no default value" but then right below describes the default 
value, which is confusing.  I would remove "no default value".
The code already deals with patterns, so why not allow a pattern like 
/dir/perf-%x.map and document that the platform-specific process id will be 
passed to String.format() to expand any formatting tokens in the string?

-

PR Comment: https://git.openjdk.org/jdk/pull/15871#issuecomment-1854683037


Re: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack

2023-11-08 Thread Dean Long
On Wed, 8 Nov 2023 19:00:53 GMT, Roman Kennke  wrote:

> See JBS issue for details.
> 
> I basically:
>  - took the test-modification and turned it into its own test-case
>  - added test runners for lightweight- and legacy-locking, so that we keep 
> testing both, no matter what is the default
>  - added Axels fix (mentioned in the JBS issue) with the modification to only 
> inflate when exec_mode == Unpack_none, as explained by Richard.
> 
> Testing:
>  - [x] EATests.java
>  - [x] tier1
>  - [ ] tier2

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16568#pullrequestreview-1721539402


Re: RFR: 8307160: Fix AWT/2D/A11Y to support the permissive- flag on the Microsoft Visual C compiler [v14]

2023-11-02 Thread Dean Long
On Thu, 2 Nov 2023 11:07:27 GMT, Julian Waters  wrote:

>> We should set the -permissive- flag for the Microsoft Visual C compiler, as 
>> was requested by the now backed out 
>> [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). Doing so makes 
>> the Visual C compiler much less accepting of ill formed code, which will 
>> improve code quality on Windows in the future.
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Changes awt_Frame.cpp

src/java.desktop/windows/native/libawt/windows/awt_Canvas.cpp line 220:

> 218: c = (AwtCanvas*) pData;
> 219: c->m_eraseBackground = doErase;
> 220: c->m_eraseBackgroundOnResize = doEraseOnResize;

Suggestion:

{
AwtCanvas *c = (AwtCanvas*) pData;
c->m_eraseBackground = doErase;
c->m_eraseBackgroundOnResize = doEraseOnResize;
}

Does wrapping in {} work?  I think it looks better, and that's how we handle 
case labels in switch statements, isn't it?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15096#discussion_r1380668853


Re: RFR: 8314265: Fix -Wconversion warnings in miscellaneous runtime code [v6]

2023-08-16 Thread Dean Long
On Wed, 16 Aug 2023 23:56:58 GMT, Coleen Phillimore  wrote:

>> Fix MaxElementPrintSize and casts. Also fixed miscellaneous -Wconversion 
>> warnings in runtime code.  This is the last one I'm going to do for runtime 
>> for a while.
>> Tested with tier1-4.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change size of op_index back.

src/hotspot/share/utilities/elfFile.cpp line 1454:

> 1452: return false;
> 1453:   }
> 1454:   uint8_t operation_advance = checked_cast(adv);

My reading of the spec is that operation_advance can be large.  I suggest 
reverting operation_advance changes and just do a checked_cast when assigning 
op_index.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15233#discussion_r1296561040


Re: RFR: 8314265: Fix -Wconversion warnings in miscellaneous runtime code [v4]

2023-08-16 Thread Dean Long
On Wed, 16 Aug 2023 19:16:28 GMT, Coleen Phillimore  wrote:

>> Fix MaxElementPrintSize and casts. Also fixed miscellaneous -Wconversion 
>> warnings in runtime code.  This is the last one I'm going to do for runtime 
>> for a while.
>> Tested with tier1-4.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove wrong comment.

src/hotspot/share/utilities/elfFile.cpp line 1723:

> 1721: void 
> DwarfFile::LineNumberProgram::LineNumberProgramState::set_index_register(const
>  uint64_t operation_advance,
> 1722: 
>   const LineNumberProgramHeader& header) {
> 1723:   _op_index = (_op_index + operation_advance) % 
> header._maximum_operations_per_instruction;

This should be narrowed by a byte.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15233#discussion_r1296456394


Re: RFR: 8314265: Fix -Wconversion warnings in miscellaneous runtime code [v4]

2023-08-16 Thread Dean Long
On Wed, 16 Aug 2023 19:16:28 GMT, Coleen Phillimore  wrote:

>> Fix MaxElementPrintSize and casts. Also fixed miscellaneous -Wconversion 
>> warnings in runtime code.  This is the last one I'm going to do for runtime 
>> for a while.
>> Tested with tier1-4.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove wrong comment.

src/hotspot/share/utilities/elfFile.hpp line 790:

> 788:   // The address and op_index registers, taken together, form an 
> operation pointer that can reference any
> 789:   // individual operation with the instruction stream. This field 
> was introduced with DWARF 4.
> 790:   uint64_t _op_index;

This can be uint8_t, because it can never be larger than 
_maximum_operations_per_instruction, which is also a byte.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15233#discussion_r1296452122


Re: RFR: 8314265: Fix -Wconversion warnings in miscellaneous runtime code [v4]

2023-08-16 Thread Dean Long
On Wed, 16 Aug 2023 19:16:28 GMT, Coleen Phillimore  wrote:

>> Fix MaxElementPrintSize and casts. Also fixed miscellaneous -Wconversion 
>> warnings in runtime code.  This is the last one I'm going to do for runtime 
>> for a while.
>> Tested with tier1-4.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove wrong comment.

src/hotspot/share/utilities/elfFile.hpp line 486:

> 484: DwarfFile* _dwarf_file;
> 485: MarkedDwarfFileReader _reader;
> 486: uint64_t _section_start_address;

This is platform-specific.
Suggestion:

uintptr_t _section_start_address;

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15233#discussion_r1296443060


Re: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v3]

2023-08-14 Thread Dean Long
On Thu, 10 Aug 2023 04:04:58 GMT, Julian Waters  wrote:

>> We should set the -permissive- flag for the Microsoft Visual C compiler, as 
>> was requested by the now backed out 
>> [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). It can be done 
>> with some effort, given that the significantly stricter gcc can now compile 
>> an experimental Windows JDK as of 2023, and will serve to significantly cut 
>> down on monstrosities in ancient Windows code
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove negation in os_windows.cpp

I had the same confusion, reading "-permissive-" the same as /permissive, but 
the trailing "-" means disable.

-

PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1678081977


Re: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v3]

2023-08-14 Thread Dean Long
On Thu, 10 Aug 2023 04:04:58 GMT, Julian Waters  wrote:

>> We should set the -permissive- flag for the Microsoft Visual C compiler, as 
>> was requested by the now backed out 
>> [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). It can be done 
>> with some effort, given that the significantly stricter gcc can now compile 
>> an experimental Windows JDK as of 2023, and will serve to significantly cut 
>> down on monstrosities in ancient Windows code
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove negation in os_windows.cpp

I had the same confusion, reading "-permissive-" the same as /permissive, but 
the trailing "-" means disable.

-

PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1678081977


Re: RFR: 8307160: [REDO] Enable the permissive- flag on the Microsoft Visual C compiler [v3]

2023-08-14 Thread Dean Long
On Thu, 10 Aug 2023 04:04:58 GMT, Julian Waters  wrote:

>> We should set the -permissive- flag for the Microsoft Visual C compiler, as 
>> was requested by the now backed out 
>> [JDK-8241499](https://bugs.openjdk.org/browse/JDK-8241499). It can be done 
>> with some effort, given that the significantly stricter gcc can now compile 
>> an experimental Windows JDK as of 2023, and will serve to significantly cut 
>> down on monstrosities in ancient Windows code
>
> Julian Waters has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove negation in os_windows.cpp

I had the same confusion, reading "-permissive-" the same as /permissive, but 
the trailing "-" means disable.

-

PR Comment: https://git.openjdk.org/jdk/pull/15096#issuecomment-1678081977


Re: RFR: 8313785: Fix -Wconversion warnings in prims code [v2]

2023-08-07 Thread Dean Long
On Fri, 4 Aug 2023 23:24:45 GMT, Coleen Phillimore  wrote:

> Why not? It's the same and casts the int where it's needed to be an int.
To me the checked_cast looks less ugly in the initialization vs at the uses 
sites, but in this case with only one use it doesn't really matter.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1286536069


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15160#pullrequestreview-1563625051


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/methodHandles.cpp line 910:

> 908:   InstanceKlass* defc = 
> InstanceKlass::cast(java_lang_Class::as_Klass(clazz));
> 909:   DEBUG_ONLY(clazz = nullptr);  // safety
> 910:   intptr_t vmindex  = java_lang_invoke_MemberName::vmindex(mname());

Why not do the checked_cast here instead of below?
Ideally, the vmindex field would be changed to int (field offsets, 
itable/vtable indexes are all int), but that's more work.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284883812


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jvm.cpp line 612:

> 610:   // as implemented in the classic virtual machine; return 0 if object 
> is null
> 611:   return handle == nullptr ? 0 :
> 612:  checked_cast(ObjectSynchronizer::FastHashCode (THREAD, 
> JNIHandles::resolve_non_null(handle)));

What about making FastHashCode return jint?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284876719


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jni.cpp line 1906:

> 1904: o = JvmtiExport::jni_SetField_probe(thread, obj, o, k, fieldID, 
> false, SigType, (jvalue *)_value); \
> 1905:   } \
> 1906:   if (SigType == JVM_SIGNATURE_BOOLEAN) { value = 
> (Argument)(((jboolean)value) & 1); } \

This is already done in bool_field_put(), so it is redundant here.  I think it 
can be safely removed.

src/hotspot/share/prims/jni.cpp line 2099:

> 2097: JvmtiExport::jni_SetField_probe(thread, nullptr, nullptr, 
> id->holder(), fieldID, true, SigType, (jvalue *)_value); \
> 2098:   } \
> 2099:   if (SigType == JVM_SIGNATURE_BOOLEAN) { value = 
> (Argument)(((jboolean)value) & 1); } \

Same as above.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284873690
PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284873793


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jni.cpp line 209:

> 207: 
> 208: 
> 209: intptr_t jfieldIDWorkaround::encode_klass_hash(Klass* k, intptr_t 
> offset) {

The caller always passes an int, so why not change the parameter to int?
Suggestion:

intptr_t jfieldIDWorkaround::encode_klass_hash(Klass* k, int offset) {

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284851860


Re: RFR: 8313554: Fix -Wconversion warnings for ResolvedFieldEntry [v4]

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 17:05:55 GMT, Matias Saavedra Silva  
wrote:

>> The recent change in 
>> [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996) added more 
>> -Wconversion warnings that are addressed in this patch. The aforementioned 
>> change has overlooked inconsistencies with the types used by 
>> `ResolvedFieldEntry` and the method `fill_in()`. Verified with tier 1-4 
>> tests.
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Changed field namnes to resolve error

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15126#pullrequestreview-1563476373


Re: RFR: 8313554: Fix -Wconversion warnings for ResolvedFieldEntry [v2]

2023-08-03 Thread Dean Long
On Thu, 3 Aug 2023 20:53:56 GMT, Matias Saavedra Silva  
wrote:

>> The recent change in 
>> [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996) added more 
>> -Wconversion warnings that are addressed in this patch. The aforementioned 
>> change has overlooked inconsistencies with the types used by 
>> `ResolvedFieldEntry` and the method `fill_in()`. Verified with tier 1-4 
>> tests.
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change to checked_cast

src/hotspot/share/interpreter/rewriter.cpp line 188:

> 186: int cp_index = Bytes::get_Java_u2(p);
> 187: int field_entry_index = _cp_map.at(cp_index);
> 188: Bytes::put_native_u2(p, checked_cast(field_entry_index));

I would have put the checked_cast on the initialization and not the use, but 
this is OK too.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15126#discussion_r1283789662


Re: RFR: 8313554: Fix -Wconversion warnings for ResolvedFieldEntry [v2]

2023-08-03 Thread Dean Long
On Thu, 3 Aug 2023 19:15:30 GMT, Coleen Phillimore  wrote:

>> As far as I've designed it, only the last two bits of the flags are ever set 
>> or used. C++ booleans are either 0 or 1 so the higher bits should never be 
>> set.
>
> It is only two bits and the only variable here "is_final_shift" is 1.  This 
> could use 1 to make it obvious but the variable is used in the 
> templateInterpreter code.  Again, I think this is good.  This is never going 
> to be a bug.

It's fine unless the shift values get changed so the value no longer fits in 
u1.  Then it fails silently.
As an alternative to checked_cast, how about something like:

assert(is_final() == is_final, "");
assert(is_volatile() == is_volatile, "");

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15126#discussion_r1283787162


Re: RFR: 8313554: Fix -Wconversion warnings for ResolvedFieldEntry

2023-08-03 Thread Dean Long
On Wed, 2 Aug 2023 19:15:24 GMT, Matias Saavedra Silva  
wrote:

> The recent change in 
> [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996) added more 
> -Wconversion warnings that are addressed in this patch. The aforementioned 
> change has overlooked inconsistencies with the types used by 
> `ResolvedFieldEntry` and the method `fill_in()`. Verified with tier 1-4 tests.

src/hotspot/share/interpreter/rewriter.cpp line 192:

> 190: int field_entry_index = Bytes::get_native_u2(p);
> 191: int pool_index = 
> _initialized_field_entries.at(field_entry_index).constant_pool_index();
> 192: Bytes::put_Java_u2(p, (u2)pool_index);

I guess this isn't using checked_cast because similar code elsewhere in this 
file uses this style, but it means we can silently do the wrong thing.  Is 
there a separate RFE to remove as many of these unchecked casts as possible?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15126#discussion_r1282886338


Re: RFR: 8313554: Fix -Wconversion warnings for ResolvedFieldEntry

2023-08-03 Thread Dean Long
On Wed, 2 Aug 2023 19:15:24 GMT, Matias Saavedra Silva  
wrote:

> The recent change in 
> [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996) added more 
> -Wconversion warnings that are addressed in this patch. The aforementioned 
> change has overlooked inconsistencies with the types used by 
> `ResolvedFieldEntry` and the method `fill_in()`. Verified with tier 1-4 tests.

src/hotspot/share/oops/resolvedFieldEntry.hpp line 106:

> 104:   void set_flags(bool is_final, bool is_volatile) {
> 105: int new_flags = (is_final << is_final_shift) | 
> static_cast(is_volatile);
> 106: _flags = (u1)new_flags;

Why isn't this using checked_cast?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15126#discussion_r1282875385


Re: RFR: 8310948: Fix ignored-qualifiers warning in Hotspot

2023-06-28 Thread Dean Long
On Tue, 27 Jun 2023 12:22:43 GMT, Daniel Jeliński  wrote:

> Please review this attempt to fix ignored-qualifiers warning.
> 
> Example warnings:
> 
> src/hotspot/share/oops/method.hpp:413:19: warning: 'volatile' type qualifier 
> on return type has no effect [-Wignored-qualifiers]
>CompiledMethod* volatile code() const;
>^
> 
> 
> src/hotspot/share/jfr/periodic/jfrModuleEvent.cpp:65:20: warning: type 
> qualifiers ignored on cast result type [-Wignored-qualifiers]
> 65 |   event.set_source((const ModuleEntry* const)from_module);
>|^
> 
> 
> The proposed fix removes the ignored qualifiers.
> In a few AD files I replaced `const` with `constexpr` where I noticed that 
> the method is returning a compile-time constant, and other platforms use 
> `constexpr` on the same method.
> 
> Release, debug and slowdebug builds on Aarch64 / x64 and Mac / Linux complete 
> without errors. Cross-compile GHA builds also pass.

In the example, was the intention perhaps `volatile CompiledMethod*` instead of 
`CompiledMethod* volatile`?

-

PR Comment: https://git.openjdk.org/jdk/pull/14674#issuecomment-1612090389


Re: RFR: 8310948: Fix ignored-qualifiers warning in Hotspot

2023-06-28 Thread Dean Long
On Tue, 27 Jun 2023 12:22:43 GMT, Daniel Jeliński  wrote:

> Please review this attempt to fix ignored-qualifiers warning.
> 
> Example warnings:
> 
> src/hotspot/share/oops/method.hpp:413:19: warning: 'volatile' type qualifier 
> on return type has no effect [-Wignored-qualifiers]
>CompiledMethod* volatile code() const;
>^
> 
> 
> src/hotspot/share/jfr/periodic/jfrModuleEvent.cpp:65:20: warning: type 
> qualifiers ignored on cast result type [-Wignored-qualifiers]
> 65 |   event.set_source((const ModuleEntry* const)from_module);
>|^
> 
> 
> The proposed fix removes the ignored qualifiers.
> In a few AD files I replaced `const` with `constexpr` where I noticed that 
> the method is returning a compile-time constant, and other platforms use 
> `constexpr` on the same method.
> 
> Release, debug and slowdebug builds on Aarch64 / x64 and Mac / Linux complete 
> without errors. Cross-compile GHA builds also pass.

In the example, was the intention perhaps `volatile CompiledMethod*` instead of 
`CompiledMethod* volatile`?

-

PR Comment: https://git.openjdk.org/jdk/pull/14674#issuecomment-1612090389


Re: RFR: 8291555: Implement alternative fast-locking scheme [v70]

2023-05-02 Thread Dean Long
On Tue, 2 May 2023 18:38:11 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is fixed size, currently with 8 elements. According to my 
>> experiments with various workloads, this covers the vast majority of 
>> workloads (in-fact, most workloads seem to never exceed 5 active locks per 
>> thread at a time). We check for overflow in the fast-paths and when the 
>> lock-stack is full, we take the slow-path, which would inflate the lock to a 
>> monitor. That case should be very rare.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v56]

2023-04-11 Thread Dean Long
On Tue, 11 Apr 2023 19:58:19 GMT, Roman Kennke  wrote:

>> The `_base` array is only initialized to nullptr in debug builds.  I don't 
>> see a release barrier in LockStack::push between the update to _base[] and 
>> the update to _top, nor a corresponding acquire barrier when reading.  
>> Doesn't this mean it is possible to racily read an uninitialized junk oop 
>> value from _base[], especially on weak memory models?
>
> Yes. The whole LockStack is not meant to be accessed cross-thread, pretty 
> much like any thread's stack is not meant to be accessed like that (including 
> current stack-locking). So what can go wrong?
> With the new locking, we could read junk and compare it to the oop that we're 
> testing against and get a wrong result. We're not going to crash though.
> With the current stack-locking, we would fetch the stack-pointer and check if 
> that address is within the foreign thread's stack. Again, because the other 
> thread is not holding still, we might get a wrong result, but we would not 
> crash.
> So I guess we need to answer the question whether or not jmm_GetThreadInfo() 
> is ok with returning wrong result and what could be the consequences of this. 
> For example, how important is it that the info about the thread(s) is correct 
> and consistent (e.g. what happens if we report two threads both holding the 
> same lock?), etc. But I don't consider this to be part of this PR.
> 
> So my proposal is: leave that code as it is, for now (being racy when 
> inspecting foreign threads, but don't crash). Open a new issue to investigate 
> and possibly fix the problem (maybe by safepointing, maybe by handshaking if 
> that is enough, or maybe we find out we don't need to do anything). Add 
> comments in relevant places to point out the problem like you and David 
> suggested earlier. Would that be ok?

That seems fine to me, as long as we don't crash.  But my understanding is that 
Generational ZGC will crash if it sees a stale oop.  Isn't it possible that the 
racing read sees junk that looks to Generational ZGC like a stale oop?  To 
avoid this, unused slots may need to be set to nullptr even in product builds.  
But I'm not a GC expert so maybe there's no problem.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163306288


Re: RFR: 8291555: Implement alternative fast-locking scheme [v56]

2023-04-11 Thread Dean Long
On Tue, 11 Apr 2023 15:29:16 GMT, Daniel D. Daugherty  
wrote:

>> OK. Given that I haven't looked at the rest of the patch, I leave it up to 
>> you and the Reviewers to figure out what to do about this code. Cheers.
>
> Given that the race with new lightweight locking is virtually the same as the 
> race
> with legacy stack locking, please do not put back the 'LockingMode == 2' check
> which would make `jmm_GetThreadInfo()` calls slower with new lightweight 
> locking
> than with legacy stack locking.
> 
> Perhaps I'm not understanding the risk of what @stefank means with:
> 
> It looks to me like the code could read racingly read the element just above 
> _top,
> which could contain a stale oop. If the address of the stale oop matches the
> address of o then contains would incorrectly return true.

The `_base` array is only initialized to nullptr in debug builds.  I don't see 
a release barrier in LockStack::push between the update to _base[] and the 
update to _top, nor a corresponding acquire barrier when reading.  Doesn't this 
mean it is possible to racily read an uninitialized junk oop value from 
_base[], especially on weak memory models?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163243827


Re: RFR: 8291555: Implement alternative fast-locking scheme [v47]

2023-03-31 Thread Dean Long
On Fri, 31 Mar 2023 07:25:48 GMT, Thomas Stuefe  wrote:

>> Roman Kennke has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Use int instead of size_t for cached offsets, to match the uncached offset 
>> type and avoid build failures
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234:
> 
>> 6232:   orr(hdr, hdr, markWord::unlocked_value);
>> 6233:   // Clear lock-bits, into t2
>> 6234:   eor(t2, hdr, markWord::unlocked_value);
> 
> In arm, I use a combination of bic and orr instead. That gives me, with just 
> two instructions, added safety against someone handing in a "11" marked MW. I 
> know, should never happen, but better safe.
> 
> 
>   ldr(new_hdr, Address(obj, oopDesc::mark_offset_in_bytes()));
>   bic(new_hdr, new_hdr, markWord::lock_mask_in_place);  // new header (00)
>   orr(old_hdr, new_hdr, markWord::unlocked_value);  // old header (01)
> 
> (note that I moved MW loading down into MA::fast_lock for unrelated reasons).
> 
> Unfortunately, on aarch64 there seem to be no bic variants that accept 
> immediates. So it would take one more instruction to get the same result:
> 
> 
> -  // Load (object->mark() | 1) into hdr
> -  orr(hdr, hdr, markWord::unlocked_value);
> -  // Clear lock-bits, into t2
> -  eor(t2, hdr, markWord::unlocked_value);
> +  // Prepare new and old header
> +  mov(t2, markWord::lock_mask_in_place);
> +  bic(t2, hdr, t2);
> +  orr(hdr, t2, markWord::unlocked_value);
> 
> 
> But maybe there is a better way that does not need three instructions.

There is a BFC (Bitfield Clear) pseudo-instruction that uses the BFM 
instruction.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154955795


Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-28 Thread Dean Long
On Tue, 28 Mar 2023 10:53:10 GMT, Roman Kennke  wrote:

>> src/hotspot/share/runtime/threads.cpp line 1422:
>> 
>>> 1420: }
>>> 1421: 
>>> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, 
>>> oop obj) {
>> 
>> Is this thread-safe?
>
> My last commit changed that code to only run during safepoints. It should be 
> safe now, and I added an assert that verifies that it is only run at 
> safepoint.

I see the assert in `owning_thread_from_monitor` but not 
`owning_thread_from_object`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150900864


Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-27 Thread Dean Long
On Fri, 24 Mar 2023 06:15:31 GMT, David Holmes  wrote:

>> Roman Kennke has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2
>>  - Set condition flags correctly after fast-lock call on aarch64
>
> src/hotspot/share/oops/oop.cpp line 126:
> 
>> 124:   // Outside of a safepoint, the header could be changing (for example,
>> 125:   // another thread could be inflating a lock on this object).
>> 126:   if (ignore_mark_word || UseFastLocking) {
> 
> Not clear why UseFastLocking appears here instead of in the return expression 
> - especially given the comment above.

I have the same question.  Based on the comment, I would expect
`return !SafepointSynchronize::is_at_safepoint() || UseFastLocking`;

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149966288


Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v4]

2023-03-08 Thread Dean Long
On Wed, 8 Mar 2023 05:17:53 GMT, Vladimir Kozlov  wrote:

>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in 
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on 
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java 
>> methods were implemented originally.
>> 
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls 
>> to runtime stubs which use the same HW instructions as C2 intrinsics. Only 
>> for 64-bit x64 because 32-bit x86 stub does not work: result is passed 
>> through FPU register and NaN values become different from C2 intrinsic. This 
>> runtime stub is only used to calculate constant values during C2 compilation 
>> and can be skipped.
>> 
>> I added new tests based on Tobias's `TestAll.java` And copied 
>> `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to 
>> make sure code is compiled by C1 or C2. I modified 
>> `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>> 
>> Tested tier1-5, Xcomp, stress
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove RISC-V port code for float16 intrinsics

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3534:

> 3532:   __ leave(); // required for proper stackwalking of RuntimeStub frame
> 3533:   __ ret(0);
> 3534: 

Do we really need to set up a stack frame for these two?  This should be a 
leaf, and we have other leaf stubs that don't set up a frame.

-

PR: https://git.openjdk.org/jdk/pull/12869


Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Dean Long
On Wed, 22 Feb 2023 21:21:42 GMT, Vladimir Kozlov  wrote:

>>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have 
>>> three mechanisms for implementing this functionality:
>>> 
>>> 1. The interpreted Java code
>>> 
>>> 2. The compiled non-intrinisc sharedRuntime code
>>> 
>>> 3. The compiler intrinsic that uses a hardware instruction.
>>> 
>>> 
>>> Unless the hardware instructions for all relevant CPUs behave exactly the 
>>> same, then I don't see how we can have parity of behaviour across these 
>>> three mechanisms.
>>> 
>>> The observed behaviour may be surprising but it seems not to be a bug. And 
>>> is this even a real concern - would real programs actually need to peek at 
>>> the raw bits and so see the difference, or does it suffice to handle Nan's 
>>> opaquely?
>> 
>> From the spec 
>> (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#float16ToFloat(short))
>> 
>> "Returns the float value closest to the numerical value of the argument, a 
>> floating-point binary16 value encoded in a short. The conversion is exact; 
>> all binary16 values can be exactly represented in float. Special cases:
>> 
>> If the argument is zero, the result is a zero with the same sign as the 
>> argument.
>> If the argument is infinite, the result is an infinity with the same 
>> sign as the argument.
>> If the argument is a NaN, the result is a NaN. "
>> 
>> If the float argument is a NaN, you are supposed to get a float16 NaN as a 
>> result -- that is all the specification requires. However, the 
>> implementation makes stronger guarantees to try to preserve some non-zero 
>> NaN significand bits if they are set.
>> 
>> "NaN boxing" is a technique used to put extra information into the 
>> significand bits a NaN and pass the around. It is consistent with the 
>> intended use of the feature by IEEE 754 and used in various language 
>> runtimes: e.g.,
>> 
>> https://piotrduperas.com/posts/nan-boxing
>> https://leonardschuetz.ch/blog/nan-boxing/ 
>> https://anniecherkaev.com/the-secret-life-of-nan
>> 
>> The Java specs are careful to avoid mentioning quiet vs signaling NaNs in 
>> general discussion.
>> 
>> That said, I think it is reasonable on a given JVM invocation if 
>> Float.floatToFloat16(f) gave the same result for input f regardless of in 
>> what context it was called.
>
>> We don't know that all HW will produce the same NaN "payload", right? 
>> Instead, we might need interpreter intrinsics. I assume that is how the trig 
>> functions are handled that @jddarcy mentioned.
> 
> Good point. We can't guarantee that all OpenJDK ports HW do the same.
> 
> If CPU has corresponding instructions we need to generate a stub during VM 
> startup with HW instructions and use it in all cases (or directly the same 
> instruction in JIT compiled code).
> If CPU does not have instruction we should use runtime C++ function in all 
> cases to be consistent.

> Thanks @vnkozlov @dean-long. One last question before I withdraw the PR: As 
> QNaN bit is supported across current architectures like x86, ARM and may be 
> others as well for conversion, couldn't we go ahead with this PR? The 
> architectures that behave differently could then follow the technique 
> suggested by Vladimir Kozlov as and when they implement the intrinsic?

No, because it's not just the SNaN vs QNaN that is different, but also the NaN 
"payload" or "boxing" that is different.  For example, the intrinsic gives me 
different results on aarch64 vs Intel with this test:


public class Foo {
  public static float hf2f(short s) {
return Float.floatToFloat16(s);
  }
  public static short f2hf(float f) {
return Float.floatToFloat16(f);
  }
  public static void main(String[] args) {
float f = Float.intBitsToFloat(0x7fc0 | 0x2000 );
System.out.println(Integer.toHexString(f2hf(f)));
f = Float.intBitsToFloat(0x7fc0 | 0x20 );
System.out.println(Integer.toHexString(f2hf(f)));
f = Float.intBitsToFloat(0x7fc0 | 0x4);
System.out.println(Integer.toHexString(f2hf(f)));
f = Float.intBitsToFloat(0x7fc0 | 0x2000 | 0x20 | 0x4);
System.out.println(Integer.toHexString(f2hf(f)));
  }
}

-

PR: https://git.openjdk.org/jdk/pull/12704


Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Dean Long
On Wed, 22 Feb 2023 05:21:48 GMT, Joe Darcy  wrote:

>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have 
>> three mechanisms for implementing this functionality:
>> 
>> 1. The interpreted Java code
>> 2. The compiled non-intrinisc sharedRuntime code
>> 3. The compiler intrinsic that uses a hardware instruction.
>> 
>> Unless the hardware instructions for all relevant CPUs behave exactly the 
>> same, then I don't see how we can have parity of behaviour across these 
>> three mechanisms.
>> 
>> The observed behaviour may be surprising but it seems not to be a bug. And 
>> is this even a real concern - would real programs actually need to peek at 
>> the raw bits and so see the difference, or does it suffice to handle Nan's 
>> opaquely?
>
>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have 
>> three mechanisms for implementing this functionality:
>> 
>> 1. The interpreted Java code
>> 
>> 2. The compiled non-intrinisc sharedRuntime code
>> 
>> 3. The compiler intrinsic that uses a hardware instruction.
>> 
>> 
>> Unless the hardware instructions for all relevant CPUs behave exactly the 
>> same, then I don't see how we can have parity of behaviour across these 
>> three mechanisms.
>> 
>> The observed behaviour may be surprising but it seems not to be a bug. And 
>> is this even a real concern - would real programs actually need to peek at 
>> the raw bits and so see the difference, or does it suffice to handle Nan's 
>> opaquely?
> 
> From the spec 
> (https://download.java.net/java/early_access/jdk20/docs/api/java.base/java/lang/Float.html#float16ToFloat(short))
> 
> "Returns the float value closest to the numerical value of the argument, a 
> floating-point binary16 value encoded in a short. The conversion is exact; 
> all binary16 values can be exactly represented in float. Special cases:
> 
> If the argument is zero, the result is a zero with the same sign as the 
> argument.
> If the argument is infinite, the result is an infinity with the same sign 
> as the argument.
> If the argument is a NaN, the result is a NaN. "
> 
> If the float argument is a NaN, you are supposed to get a float16 NaN as a 
> result -- that is all the specification requires. However, the implementation 
> makes stronger guarantees to try to preserve some non-zero NaN significand 
> bits if they are set.
> 
> "NaN boxing" is a technique used to put extra information into the 
> significand bits a NaN and pass the around. It is consistent with the 
> intended use of the feature by IEEE 754 and used in various language 
> runtimes: e.g.,
> 
> https://piotrduperas.com/posts/nan-boxing
> https://leonardschuetz.ch/blog/nan-boxing/ 
> https://anniecherkaev.com/the-secret-life-of-nan
> 
> The Java specs are careful to avoid mentioning quiet vs signaling NaNs in 
> general discussion.
> 
> That said, I think it is reasonable on a given JVM invocation if 
> Float.floatToFloat16(f) gave the same result for input f regardless of in 
> what context it was called.

We don't know that all HW will produce the same NaN "payload", right?  Instead, 
we might need interpreter intrinsics.  I assume that is how the trig functions 
are handled that @jddarcy mentioned.

-

PR: https://git.openjdk.org/jdk/pull/12704


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33]

2022-11-30 Thread Dean Long
On Tue, 29 Nov 2022 11:49:10 GMT, Andrew Haley  wrote:

>> Andrew Haley has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Unused variable
>
> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/ScopedValue.java
>  line 385:
> 
>> 383: try {
>> 384: JLA.setScopedValueBindings(newSnapshot);
>> 385: JLA.ensureMaterializedForStackWalk(newSnapshot);
> 
> Question: is it necessary here to invoke `ensureMaterializedForStackWalk()` 
> It's really only there to prevent the new `Snapshot` from being scalar 
> replaced. But we know that it cannot be scalar replaced, because it really 
> does escape: a pointer to it is stored in the current `Thread`. So should we 
> simply remove the call to `ensureMaterializedForStackWalk()`, on the grounds 
> that it cannot have any effect?

It does seem unnecessary here, but I'm not an expert on current and future C2 
escape analysis.  @vnkozlov, do you agree?

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33]

2022-11-30 Thread Dean Long
On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley  wrote:

>> JEP 429 implementation.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Unused variable

I made a few minor suggestions, but overall the HotSpot changes look good.  
Nice job Andrew.

-

Marked as reviewed by dlong (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33]

2022-11-30 Thread Dean Long
On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley  wrote:

>> JEP 429 implementation.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Unused variable

src/hotspot/share/prims/jvm.cpp line 1385:

> 1383:   vframeStream vfst(thread);
> 1384:   for(; !vfst.at_end(); vfst.next()) {
> 1385: int loc = 0;

Use -1 instead (see below)?

src/hotspot/share/prims/jvm.cpp line 1400:

> 1398: }
> 1399: 
> 1400: if (loc != 0) {

As 0 is normally a valid local number, how about using -1 to mean "not found"?

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33]

2022-11-30 Thread Dean Long
On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley  wrote:

>> JEP 429 implementation.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Unused variable

src/hotspot/share/classfile/javaClasses.cpp line 1731:

> 1729: }
> 1730: 
> 1731: void java_lang_Thread::clear_scopedValueBindings(oop java_thread) {

It looks like there is only one caller of this method that takes an oop.  Would 
it make sense to have the caller check for the null oop, and assert that the 
oop is not null here?

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator) [v33]

2022-11-30 Thread Dean Long
On Thu, 24 Nov 2022 14:05:41 GMT, Andrew Haley  wrote:

>> JEP 429 implementation.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Unused variable

src/hotspot/cpu/aarch64/aarch64.ad line 3635:

> 3633:   }
> 3634: } else if (_method->intrinsic_id() == 
> vmIntrinsicID::_ensureMaterializedForStackWalk) {
> 3635:   __ nop();

Please add a comment explaining why the nop is needed or desirable here.

src/hotspot/cpu/x86/x86_64.ad line 2174:

> 2172:  RELOC_DISP32);
> 2173: } else if (_method->intrinsic_id() == 
> vmIntrinsicID::_ensureMaterializedForStackWalk) {
> 2174:   __ addr_nop_5();

Needs a comment.  I guess this is because of how call sizes are computed.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Fri, 4 Nov 2022 09:53:39 GMT, Andrew Haley  wrote:

>> src/java.base/share/classes/java/lang/Thread.java line 1610:
>> 
>>> 1608: ensureMaterializedForStackWalk(bindings);
>>> 1609: task.run();
>>> 1610: Reference.reachabilityFence(bindings);
>> 
>> This should probably be in a `try`‑`finally` block:
>> Suggestion:
>> 
>> try {
>> task.run();
>> } finally {
>> Reference.reachabilityFence(bindings);
>> }
>
> I wonder. The pattern I'm using here is based on 
> `AccessController.executePrivileged`, which doesn't have the `finally` 
> clause. Perhaps I should add one here anyway.

I hope it doesn't matter.  There is an example in the reachabilityFence 
javadocs where it does not use finally.  For it to matter, I think the compiler 
would need to inline through run() and prove that it can throw an exception, 
but I don't think that's how the JIT compilers currently implement 
reachabilityFence.  I suppose a finally shouldn't hurt, however.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Fri, 4 Nov 2022 09:53:39 GMT, Andrew Haley  wrote:

>> src/java.base/share/classes/java/lang/Thread.java line 1610:
>> 
>>> 1608: ensureMaterializedForStackWalk(bindings);
>>> 1609: task.run();
>>> 1610: Reference.reachabilityFence(bindings);
>> 
>> This should probably be in a `try`‑`finally` block:
>> Suggestion:
>> 
>> try {
>> task.run();
>> } finally {
>> Reference.reachabilityFence(bindings);
>> }
>
> I wonder. The pattern I'm using here is based on 
> `AccessController.executePrivileged`, which doesn't have the `finally` 
> clause. Perhaps I should add one here anyway.

I hope it doesn't matter.  There is an example in the reachabilityFence 
javadocs where it does not use finally.  For it to matter, I think the compiler 
would need to inline through run() and prove that it can throw an exception, 
but I don't think that's how the JIT compilers currently implement 
reachabilityFence.  I suppose a finally shouldn't hurt, however.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Fri, 4 Nov 2022 09:53:39 GMT, Andrew Haley  wrote:

>> src/java.base/share/classes/java/lang/Thread.java line 1610:
>> 
>>> 1608: ensureMaterializedForStackWalk(bindings);
>>> 1609: task.run();
>>> 1610: Reference.reachabilityFence(bindings);
>> 
>> This should probably be in a `try`‑`finally` block:
>> Suggestion:
>> 
>> try {
>> task.run();
>> } finally {
>> Reference.reachabilityFence(bindings);
>> }
>
> I wonder. The pattern I'm using here is based on 
> `AccessController.executePrivileged`, which doesn't have the `finally` 
> clause. Perhaps I should add one here anyway.

I hope it doesn't matter.  There is an example in the reachabilityFence 
javadocs where it does not use finally.  For it to matter, I think the compiler 
would need to inline through run() and prove that it can throw an exception, 
but I don't think that's how the JIT compilers currently implement 
reachabilityFence.  I suppose a finally shouldn't hurt, however.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley  wrote:

> JEP 429 implementation.

src/hotspot/share/prims/jvm.cpp line 1410:

> 1408:   loc = 3;
> 1409: } else if (method == resolver.thread_run_method) {
> 1410:   loc = 2;

This depends on how javac numbers locals, right?  It seems a bit fragile.  This 
is one of the reasons why doPrivileged uses the helper method 
executePrivileged, so the locals are arguments, giving them predictable offsets.

src/java.base/share/classes/jdk/internal/vm/ScopedValueContainer.java line 53:

> 51: /**
> 52:  * Returns the "latest" ScopedValueContainer for the current Thread. 
> This may be on
> 53:  * the current thread's scope task or ma require walking up the tree 
> to find it.

Suggestion:

 * the current thread's scope task or may require walking up the tree to 
find it.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley  wrote:

> JEP 429 implementation.

src/hotspot/share/prims/jvm.cpp line 1410:

> 1408:   loc = 3;
> 1409: } else if (method == resolver.thread_run_method) {
> 1410:   loc = 2;

This depends on how javac numbers locals, right?  It seems a bit fragile.  This 
is one of the reasons why doPrivileged uses the helper method 
executePrivileged, so the locals are arguments, giving them predictable offsets.

src/java.base/share/classes/jdk/internal/vm/ScopedValueContainer.java line 53:

> 51: /**
> 52:  * Returns the "latest" ScopedValueContainer for the current Thread. 
> This may be on
> 53:  * the current thread's scope task or ma require walking up the tree 
> to find it.

Suggestion:

 * the current thread's scope task or may require walking up the tree to 
find it.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: JDK-8286666: JEP 429: Implementation of Scoped Values (Incubator)

2022-11-15 Thread Dean Long
On Wed, 2 Nov 2022 16:23:34 GMT, Andrew Haley  wrote:

> JEP 429 implementation.

src/hotspot/share/prims/jvm.cpp line 1410:

> 1408:   loc = 3;
> 1409: } else if (method == resolver.thread_run_method) {
> 1410:   loc = 2;

This depends on how javac numbers locals, right?  It seems a bit fragile.  This 
is one of the reasons why doPrivileged uses the helper method 
executePrivileged, so the locals are arguments, giving them predictable offsets.

src/java.base/share/classes/jdk/internal/vm/ScopedValueContainer.java line 53:

> 51: /**
> 52:  * Returns the "latest" ScopedValueContainer for the current Thread. 
> This may be on
> 53:  * the current thread's scope task or ma require walking up the tree 
> to find it.

Suggestion:

 * the current thread's scope task or may require walking up the tree to 
find it.

-

PR: https://git.openjdk.org/jdk/pull/10952


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v5]

2022-09-07 Thread Dean Long
On Wed, 7 Sep 2022 01:05:38 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add missing "this->"

Marked as reviewed by dlong (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-03 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

Wouldn't the SA stack walk fail when attaching to a core dump from an earlier 
JVM that does not exclude nulls?

-

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-02 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

I'm concerned about the "excluded bytes" change.  It appears to change the 
encoding, which would mean that SA would not be able to read streams from 
earlier JVMs, which may or may not be a concern.  I suggest splitting this up 
into the straight-forward refactor (without the excluded bytes change), then 
adding excluded bytes as a follow-up.

-

Changes requested by dlong (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-01 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

src/hotspot/share/code/compressedStream.cpp line 85:

> 83:   const int min_expansion = UNSIGNED5::MAX_LENGTH;
> 84:   if (nsize < min_expansion*2)
> 85: nsize = min_expansion*2;

It's not clear if this is needed or just an optimization.  Maybe add a comment. 
 Also, using MIN2 might be clearer.

-

PR: https://git.openjdk.org/jdk/pull/10067


  1   2   3   4   >