Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

Vladimir Kozlov Mon, 06 Mar 2023 18:08:06 -0800

On Tue, 7 Mar 2023 01:04:00 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> 
wrote:


>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in 
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on 
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java 
>> methods were implemented originally.
>> 
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls 
>> to runtime stubs which use the same HW instructions as C2 intrinsics. Only 
>> for 64-bit x64 because 32-bit x86 stub does not work: result is passed 
>> through FPU register and NaN values become different from C2 intrinsic. This 
>> runtime stub is only used to calculate constant values during C2 compilation 
>> and can be skipped.
>> 
>> I added new tests based on Tobias's `TestAll.java` And copied 
>> `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to 
>> make sure code is compiled by C1 or C2. I modified 
>> `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>> 
>> Tested tier1-5, Xcomp, stress
>
> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 199:
> 
>> 197:   void flt_to_flt16(Register dst, XMMRegister src, XMMRegister tmp) {
>> 198:     // Instruction requires different XMM registers
>> 199:     vcvtps2ph(tmp, src, 0x04, Assembler::AVX_128bit);
> 
> vcvtps2ph can have source and destination as same. Did you mean to say here 
> in the comment that "Instruction requires XMM register as destination"?

`flt_to_flt16` is used in `x86.ad` instruction which requires preserving `src` 
register.
I did not want to add an other macroassembler instruction for src->src case.
I will add this to this comment.

> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3928:
> 
>> 3926:   }
>> 3927: 
>> 3928:   if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
> 
> We could check for VM_Version::supports_float16() here instead.

Yes.

> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
> 
>> 3929:     // For results consistency both intrinsics should be enabled.
>> 3930:     if 
>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>> 3931:         
>> vmIntrinsics::is_intrinsic_available(vmIntrinsics::_floatToFloat16)) {
> 
> Should this also check for InlineIntrinsics?

`vmIntrinsics::disabled_by_jvm_flags()` checks `InlineIntrinsics`. See 
`vmIntrinsics.cpp` changes.

> src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 346:
> 
>> 344:   }
>> 345:   // For AVX CPUs only. f16c support is disabled if UseAVX == 0.
>> 346:   if (VM_Version::supports_f16c() || VM_Version::supports_avx512vl()) {
> 
> We could check for VM_Version::supports_float16() here instead.

Yes. And I need to remove `!InlineIntrinsics` check at line 340.

-------------

PR: https://git.openjdk.org/jdk/pull/12869

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

Reply via email to