Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3]

Jorn Vernee Tue, 28 Feb 2023 13:07:02 -0800

On Thu, 23 Feb 2023 06:18:49 GMT, Martin Doerr <mdo...@openjdk.org> wrote:


>> Implementation of "Foreign Function & Memory API" for linux on Power (Little 
>> Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
>> 
>> This PR does not include code for VaList support because it's supposed to 
>> get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). 
>> I've kept the related tests disabled for this platform and throw an 
>> exception instead. Note that the ABI doesn't precisely specify variable 
>> argument lists. Instead, it refers to `<stdarg.h>` (2.2.4 Variable Argument 
>> Lists).
>> 
>> Big Endian support is implemented to some extend, but not complete. E.g. 
>> structs with size not divisible by 8 are not passed correctly (see 
>> `useABIv2` in CallArranger.java). Big Endian is excluded by selecting 
>> `ARCH.equals("ppc64le")` (CABI.java) only.
>> 
>> There's another limitation: This PR only accepts structures with size 
>> divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I 
>> think arbitrary sizes are not usable on other platforms, either, because 
>> `SharedUtils.primitiveCarrierForSize` only accepts powers of 2.
>> 
>> The ABI has some tricky corner cases related to HFA (Homogeneous Float 
>> Aggregate). The same argument may need to get passed in both, a FP reg and a 
>> GP reg or stack slot (see "no partial DW rule"). This cases are not covered 
>> by the existing tests.
>> 
>> I had to make changes to shared code and code for other platforms:
>> 1. Pass type information when creating `VMStorage` objects from `VMReg`. 
>> This is needed for the following reasons:
>> - PPC64 ABI requires integer types to get extended to 64 bit (also see 
>> CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to 
>> know the type or at least the bit width for that.
>> - Floating point load / store instructions need the correct width to select 
>> between the correct IEEE 754 formats. The register representation in single 
>> FP registers is always IEEE 754 double precision on PPC64.
>> - Big Endian also needs usage of the precise size. Storing 8 Bytes and 
>> loading 4 Bytes yields different values than on Little Endian!
>> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer 
>> (with byteSize() == 0) while running TestUpcallScope. Hence, existing size 
>> checks don't work (see MemorySegment.java). As a workaround, I'm just 
>> skipping the check in this particular case. Please check if this makes sense 
>> or if there's a better fix (possibly as separate RFE).
>
> Martin Doerr has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove size restriction for structs. Add TODO for Big Endian.

This looks good overall I think, though I'll stick to my previous suggestion to 
try and move more logic into Java.

Also, I recommend adding a test in `java/foreign/callarranger` similar to the 
tests already found there. They call the CallGenerator directly and then check 
the binding recipe. This could prove useful if we need to refactor shared code 
for whatever reason, since those tests run on (almost) every platform, so can 
be used to do some basic sanity checking.

src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 133:

> 131:   Register callerSP            = R2, // C/C++ uses R2 as TOC, but we can 
> reuse it here
> 132:            tmp                 = R11_scratch1, // same as shuffle_reg
> 133:            call_target_address = R12_scratch2; // same as _abi._scratch2 
> (ABIv2 requires this reg!)

Do I understand correctly that the ABI requires the register to be used for the 
call to be `R12`? How does that make a difference? I guess in some cases the 
callee might want to know the address through which it is called? (so it looks 
at `R12`)

src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 154:

> 152:   // (abi_reg_args is abi_minframe plus space for 8 argument register 
> spill slots)
> 153:   assert(_abi._shadow_space_bytes == frame::abi_minframe_size, "expected 
> space according to ABI");
> 154:   int allocated_frame_size = frame::abi_minframe_size + 
> MAX2(_input_registers.length(), 8) * BytesPerWord;

This is hard-coding an assumption about the ABI that's being called. Ok for now.

If it needs to be addressed in the future, it could be done by adding another 
field to `ABIDescriptor` like `min_stack_arg_bytes`, or something like that 
(which is set to zero for other ABIs). It seems to be different from 
`shadow_space` since it's also used by the caller to put stack arguments.

src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 343:

> 341: 
> 342:   __ flush();
> 343:   // Disassembler::decode((u_char*)start, (u_char*)__ pc(), tty);

Leftover commented code?

(note that the stub can also be disassembled with 
`-Xlog:foreign+downcall=trace` now)

src/hotspot/cpu/ppc/foreignGlobals_ppc.cpp line 229:

> 227: 
> 228: void ArgumentShuffle::pd_generate(MacroAssembler* masm, VMStorage tmp, 
> int in_stk_bias, int out_stk_bias, const StubLocations& locs) const {
> 229:   Register callerSP = as_Register(tmp); // preset

It looks like `tmp` is being used to hold the caller's SP. I'm guessing this 
can not be computed the same way as we do on x86 and aarch64? (based on `RBP`, 
`RFP_BIAS`)

If you want, you could add another register argument to `pd_generate` that is 
just invalid/unused on other platforms. That way you could use `tmp` for the 
shuffling instead of having to go through the stack. (looks like `R0` is 
already used in some cases as a temp register)

src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 137:

> 135:   ArgumentShuffle arg_shuffle(in_sig_bt, total_in_args, out_sig_bt, 
> total_out_args, &in_conv, &out_conv, shuffle_reg);
> 136:   // The Java call uses the JIT ABI, but we also call C.
> 137:   int out_arg_area = MAX2(frame::jit_out_preserve_size + 
> arg_shuffle.out_arg_bytes(), (int)frame::abi_reg_args_size);

We need `frame::abi_reg_args_size` since we call `on_entry`/`on_exit` which 
require the stack space I guess?

src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 240:

> 238:   __ ld(call_target_address, in_bytes(Method::from_compiled_offset()), 
> R19_method);
> 239:   __ mtctr(call_target_address);
> 240:   __ bctrl();

Ok, I see. I guess there is some special purpose register called `CTR` which we 
are moving to for `bctrl` here. Does ABIv2 require that move to always come 
from `R12`? (from the comment in downcallLinker).

(I'm trying to understand the requirements for possibly tweaking shared code).

src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 347:

> 345:   FunctionDescriptor* fd = (FunctionDescriptor*)fd_addr;
> 346:   fd->set_entry(fd_addr + sizeof(FunctionDescriptor));
> 347: #endif

Had to do a double take. Looks like we're not the only one who are using the 
name `FunctionDescriptor` :)

src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 356:

> 354:   }
> 355: #endif
> 356:   //blob->print_on(tty);

Leftover commented code?

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java 
line 68:

> 66: public abstract class CallArranger {
> 67:     // Linux PPC64 Little Endian uses ABI v2.
> 68:     private static final boolean useABIv2 = ByteOrder.nativeOrder() == 
> ByteOrder.LITTLE_ENDIAN;

Now that I'm here. This could be a potentially interesting case for having 2 
subclasses of CallArranger: one for `useABIv2 == true` and one for `false`.

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java 
line 81:

> 79:         new VMStorage[] { f1, f2, f3, f4, f5, f6, f7, f8 }, // FP output
> 80:         new VMStorage[] { r0, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, 
> r12 }, // volatile GP
> 81:         new VMStorage[] { f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, 
> f11, f12, f13 }, // volatile FP

Note that argument registers are assumed volatile, so they don't have to be 
duplicated here.

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java 
line 286:

> 284:                 // "no partial DW rule": Mark first stack slot to get 
> filled.
> 285:                 // Note: Can only happen with forArguments = true.
> 286:                 VMStorage overlappingReg = null;

`overlappingReg` is initialized along all branches, so it's not needed to 
assign `null` here.

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java 
line 293:

> 291:                 } else {
> 292:                     overlappingReg = new 
> VMStorage(StorageType.STACK_AND_FLOAT,
> 293:                                                    (short) 
> STACK_SLOT_SIZE, (int) stackOffset - 4);

I think you could remove the mixed VMStorage types here relatively easily by 
returning a `VMStorage[][]`, where each element is a single element array, but 
then for the `needOverlapping` case add another element to the array for the 
extra store.

Then when unboxing a `STRUCT_HFA`,  `dup` the result of the `bufferLoad` and 
then do 2 `vmStore`s (one for each element).

For boxing, you could just ignore the extra storage, and just `vmLoad` the 
first one (or, whichever one you like :))

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/TypeClass.java line 
66:

> 64:     }
> 65: 
> 66:     static boolean isHomogeneousFloatAggregate(MemoryLayout type, boolean 
> useABIv2) {

Note that we had to make some changes to this routine on AArch64, since it 
didn't properly account for nested structs/unions and arrays. See: 
https://github.com/openjdk/panama-foreign/pull/780

Just as a heads up, in case PPC needs changes too.

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64CallArranger.java
 line 33:

> 31:  * PPC64 CallArranger specialized for Linux ABI.
> 32:  */
> 33: public class LinuxPPC64CallArranger extends CallArranger {

I don't really see the point in having a separate subclass with `CallArranger` 
being abstract, unless you are planning to add other implementations later?

(edit: see also later comment in CallArranger 
https://github.com/openjdk/jdk/pull/12708#discussion_r1120753657)

-------------

PR: https://git.openjdk.org/jdk/pull/12708

Re: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3]

Reply via email to