Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-23 Thread Richard Reingruber
On Tue, 23 May 2023 12:26:52 GMT, Martin Doerr  wrote:

> I believe omitting the PSA is wrong for varargs, but we don't have this 
> information in the backend.

It is clearly wrong.

> So, I think we should simply not optimize it.

Already agreed for this version.

> Reserving 64 Byte stack space should be affordable for a downcall even if 
> it's not always needed.

It is hardly ever needed to begin with. It's just not pretty to allocate the 
PSA for little test functions. It'll confuse people that compare the downcall 
stub to what the C compiler produces. They might even think we havn't 
understood the ABI ;)

> The Java side could compute it, but there's no way to pass this information 
> to the backend.

I'm not sure about that. First idea would be to keep the allocated `stack` in 
`nextStorage` if needed and pass it to the backend.

> I've improved the comments. Please take a look.

Great! Thanks :)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1202652115


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-23 Thread Martin Doerr
On Tue, 23 May 2023 07:46:08 GMT, Richard Reingruber  wrote:

>> src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163:
>> 
>>> 161:   // The Parameter Save Area needs to be at least 8 slots for ABIv1.
>>> 162:   // ABIv2 allows omitting it when all parameters can get passed in 
>>> registers. We currently don't optimize this.
>>> 163:   // For ABIv2, we only need (_input_registers.length() > 8) ? 
>>> _input_registers.length() : 0
>> 
>> The PSA is also needed if the parameter list is variable in length. Is the 
>> expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` 
>> correct in that case too?
>> Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates 
>> that stack parameters are not expected. We currently don't optimize this.`
>
> Ok, I see now. This is not obvious though. There are a few layers of 
> abstraction at play which hide this. A comment is needed. Maybe like this:
> ```c++
> // With ABIv1 a Parameter Save Area of at least 8 double words is 
> always needed.
> // ABIv2 allows omitting it if the callee's prototype indicates 
> that stack parameters are not expected.
> // We currently don't optimize this (see DowncallStubGenerator in 
> the backend).
> if (reg == null) return stack;

I believe omitting the PSA is wrong for varargs, but we don't have this 
information in the backend. So, I think we should simply not optimize it. 
Reserving 64 Byte stack space should be affordable for a downcall even if it's 
not always needed. The Java side could compute it, but there's no way to pass 
this information to the backend. I've improved the comments. Please take a look.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1202235085


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-23 Thread Richard Reingruber
On Mon, 22 May 2023 22:44:41 GMT, Martin Doerr  wrote:

> Thanks for publishing our discussion, here. The unnecessary PSA affects other 
> areas of hotspot much more than Panama. Yes, we should file an RFE. I think 
> one for hotspot is sufficient as the downcall stub is part of it. I don't 
> think it needs extra treatment.

That's fine. It should have a little list of areas to be revisited.

Adoc:

- Runtime calls by the interpreter, c1, and c2
- Interpreted and compiled JNI calls
- FFM API ("Panama") calls
- Runtime calls by continuation intrisics
- Runtime calls by GC barriers

Subtasks can be generated from that list.

-

PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558725262


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-23 Thread Richard Reingruber
On Tue, 23 May 2023 07:37:37 GMT, Richard Reingruber  wrote:

>> Martin Doerr has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Parameter Save Area is the correct name.
>
> src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163:
> 
>> 161:   // The Parameter Save Area needs to be at least 8 slots for ABIv1.
>> 162:   // ABIv2 allows omitting it when all parameters can get passed in 
>> registers. We currently don't optimize this.
>> 163:   // For ABIv2, we only need (_input_registers.length() > 8) ? 
>> _input_registers.length() : 0
> 
> The PSA is also needed if the parameter list is variable in length. Is the 
> expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` 
> correct in that case too?
> Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates that 
> stack parameters are not expected. We currently don't optimize this.`

Ok, I see now. This is not obvious though. There are a few layers of 
abstraction at play which hide this. A comment is needed. Maybe like this:
```c++
// With ABIv1 a Parameter Save Area of at least 8 double words is 
always needed.
// ABIv2 allows omitting it if the callee's prototype indicates 
that stack parameters are not expected.
// We currently don't optimize this (see DowncallStubGenerator in 
the backend).
if (reg == null) return stack;

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201706335


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-23 Thread Richard Reingruber
On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr  wrote:

>> Implementation of "Foreign Function & Memory API" for linux on Power (Little 
>> Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
>> 
>> This PR does not include code for VaList support because it's supposed to 
>> get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). 
>> I've kept the related tests disabled for this platform and throw an 
>> exception instead. Note that the ABI doesn't precisely specify variable 
>> argument lists. Instead, it refers to `` (2.2.4 Variable Argument 
>> Lists).
>> 
>> Big Endian support is implemented to some extend, but not complete. E.g. 
>> structs with size not divisible by 8 are not passed correctly (see 
>> `useABIv2` in CallArranger.java). Big Endian is excluded by selecting 
>> `ARCH.equals("ppc64le")` (CABI.java) only.
>> 
>> There's another limitation: This PR only accepts structures with size 
>> divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I 
>> think arbitrary sizes are not usable on other platforms, either, because 
>> `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: 
>> Resolved after merging of 
>> [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017)
>> 
>> The ABI has some tricky corner cases related to HFA (Homogeneous Float 
>> Aggregate). The same argument may need to get passed in both, a FP reg and a 
>> GP reg or stack slot (see "no partial DW rule"). This cases are not covered 
>> by the existing tests.
>> 
>> I had to make changes to shared code and code for other platforms:
>> 1. Pass type information when creating `VMStorage` objects from `VMReg`. 
>> This is needed for the following reasons:
>> - PPC64 ABI requires integer types to get extended to 64 bit (also see 
>> CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to 
>> know the type or at least the bit width for that.
>> - Floating point load / store instructions need the correct width to select 
>> between the correct IEEE 754 formats. The register representation in single 
>> FP registers is always IEEE 754 double precision on PPC64.
>> - Big Endian also needs usage of the precise size. Storing 8 Bytes and 
>> loading 4 Bytes yields different values than on Little Endian!
>> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer 
>> (with byteSize() == 0) while running TestUpcallScope. Hence, existing size 
>> checks don't work (see MemorySegment.java). As a workaround, I'm just 
>> skipping the check in this particular case. Please check if this makes sense 
>> or if there's a better fix (possibly as separat...
>
> Martin Doerr has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Parameter Save Area is the correct name.

src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163:

> 161:   // The Parameter Save Area needs to be at least 8 slots for ABIv1.
> 162:   // ABIv2 allows omitting it when all parameters can get passed in 
> registers. We currently don't optimize this.
> 163:   // For ABIv2, we only need (_input_registers.length() > 8) ? 
> _input_registers.length() : 0

The PSA is also needed if the parameter list is variable in length. Is the 
expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` 
correct in that case too?
Otherwise: `ABIv2 allows omitting it if the caller's prototype indicates that 
stack parameters are not expected. We currently don't optimize this.`

-

PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201693249


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-22 Thread Martin Doerr
On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr  wrote:

>> Implementation of "Foreign Function & Memory API" for linux on Power (Little 
>> Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
>> 
>> This PR does not include code for VaList support because it's supposed to 
>> get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). 
>> I've kept the related tests disabled for this platform and throw an 
>> exception instead. Note that the ABI doesn't precisely specify variable 
>> argument lists. Instead, it refers to `` (2.2.4 Variable Argument 
>> Lists).
>> 
>> Big Endian support is implemented to some extend, but not complete. E.g. 
>> structs with size not divisible by 8 are not passed correctly (see 
>> `useABIv2` in CallArranger.java). Big Endian is excluded by selecting 
>> `ARCH.equals("ppc64le")` (CABI.java) only.
>> 
>> There's another limitation: This PR only accepts structures with size 
>> divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I 
>> think arbitrary sizes are not usable on other platforms, either, because 
>> `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: 
>> Resolved after merging of 
>> [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017)
>> 
>> The ABI has some tricky corner cases related to HFA (Homogeneous Float 
>> Aggregate). The same argument may need to get passed in both, a FP reg and a 
>> GP reg or stack slot (see "no partial DW rule"). This cases are not covered 
>> by the existing tests.
>> 
>> I had to make changes to shared code and code for other platforms:
>> 1. Pass type information when creating `VMStorage` objects from `VMReg`. 
>> This is needed for the following reasons:
>> - PPC64 ABI requires integer types to get extended to 64 bit (also see 
>> CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to 
>> know the type or at least the bit width for that.
>> - Floating point load / store instructions need the correct width to select 
>> between the correct IEEE 754 formats. The register representation in single 
>> FP registers is always IEEE 754 double precision on PPC64.
>> - Big Endian also needs usage of the precise size. Storing 8 Bytes and 
>> loading 4 Bytes yields different values than on Little Endian!
>> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer 
>> (with byteSize() == 0) while running TestUpcallScope. Hence, existing size 
>> checks don't work (see MemorySegment.java). As a workaround, I'm just 
>> skipping the check in this particular case. Please check if this makes sense 
>> or if there's a better fix (possibly as separat...
>
> Martin Doerr has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Parameter Save Area is the correct name.

Thanks for publishing our discussion, here. The unnecessary PSA affects other 
areas of hotspot much more than Panama. Yes, we should file an RFE. I think one 
for hotspot is sufficient as the downcall stub is part of it. I don't think it 
needs extra treatment.

-

PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558139323


Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35]

2023-05-22 Thread Martin Doerr
> Implementation of "Foreign Function & Memory API" for linux on Power (Little 
> Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
> 
> This PR does not include code for VaList support because it's supposed to get 
> removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've 
> kept the related tests disabled for this platform and throw an exception 
> instead. Note that the ABI doesn't precisely specify variable argument lists. 
> Instead, it refers to `` (2.2.4 Variable Argument Lists).
> 
> Big Endian support is implemented to some extend, but not complete. E.g. 
> structs with size not divisible by 8 are not passed correctly (see `useABIv2` 
> in CallArranger.java). Big Endian is excluded by selecting 
> `ARCH.equals("ppc64le")` (CABI.java) only.
> 
> There's another limitation: This PR only accepts structures with size 
> divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I 
> think arbitrary sizes are not usable on other platforms, either, because 
> `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: 
> Resolved after merging of 
> [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017)
> 
> The ABI has some tricky corner cases related to HFA (Homogeneous Float 
> Aggregate). The same argument may need to get passed in both, a FP reg and a 
> GP reg or stack slot (see "no partial DW rule"). This cases are not covered 
> by the existing tests.
> 
> I had to make changes to shared code and code for other platforms:
> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This 
> is needed for the following reasons:
> - PPC64 ABI requires integer types to get extended to 64 bit (also see 
> CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to 
> know the type or at least the bit width for that.
> - Floating point load / store instructions need the correct width to select 
> between the correct IEEE 754 formats. The register representation in single 
> FP registers is always IEEE 754 double precision on PPC64.
> - Big Endian also needs usage of the precise size. Storing 8 Bytes and 
> loading 4 Bytes yields different values than on Little Endian!
> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with 
> byteSize() == 0) while running TestUpcallScope. Hence, existing size checks 
> don't work (see MemorySegment.java). As a workaround, I'm just skipping the 
> check in this particular case. Please check if this makes sense or if there's 
> a better fix (possibly as separate RFE). Update: T...

Martin Doerr has updated the pull request incrementally with one additional 
commit since the last revision:

  Parameter Save Area is the correct name.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12708/files
  - new: https://git.openjdk.org/jdk/pull/12708/files/08a5c143..b912155b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=34
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=33-34

  Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/jdk/pull/12708.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708

PR: https://git.openjdk.org/jdk/pull/12708