On Tue, 7 Mar 2023 01:04:00 GMT, Sandhya Viswanathan <[email protected]>
wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>> methods were implemented originally.
>>
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls
>> to runtime stubs which use the same HW instructions as C2 intrinsics. Only
>> for 64-bit x64 because 32-bit x86 stub does not work: result is passed
>> through FPU register and NaN values become different from C2 intrinsic. This
>> runtime stub is only used to calculate constant values during C2 compilation
>> and can be skipped.
>>
>> I added new tests based on Tobias's `TestAll.java` And copied
>> `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to
>> make sure code is compiled by C1 or C2. I modified
>> `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>>
>> Tested tier1-5, Xcomp, stress
>
> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 199:
>
>> 197: void flt_to_flt16(Register dst, XMMRegister src, XMMRegister tmp) {
>> 198: // Instruction requires different XMM registers
>> 199: vcvtps2ph(tmp, src, 0x04, Assembler::AVX_128bit);
>
> vcvtps2ph can have source and destination as same. Did you mean to say here
> in the comment that "Instruction requires XMM register as destination"?
`flt_to_flt16` is used in `x86.ad` instruction which requires preserving `src`
register.
I did not want to add an other macroassembler instruction for src->src case.
I will add this to this comment.
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3928:
>
>> 3926: }
>> 3927:
>> 3928: if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
>
> We could check for VM_Version::supports_float16() here instead.
Yes.
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
>
>> 3929: // For results consistency both intrinsics should be enabled.
>> 3930: if
>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>> 3931:
>> vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) {
>
> Should this also check for InlineIntrinsics?
`vmIntrinsics::disabled_by_jvm_flags()` checks `InlineIntrinsics`. See
`vmIntrinsics.cpp` changes.
> src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 346:
>
>> 344: }
>> 345: // For AVX CPUs only. f16c support is disabled if UseAVX == 0.
>> 346: if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
>
> We could check for VM_Version::supports_float16() here instead.
Yes. And I need to remove `!InlineIntrinsics` check at line 340.
-------------
PR: https://git.openjdk.org/jdk/pull/12869