On Mon, 13 Jan 2025 16:51:02 GMT, Paul Sandoz <[email protected]> wrote:
>> Hi @PaulSandoz , In the current scheme we are passing unboxed carriers to
>> intrinsic entry point, in the fallback implementation carrier type is first
>> converted to floating point value using Float.float16ToFloat API which
>> expects to receive a short type argument, after the operation we again
>> convert float value to carrier type (short) using Float.floatToFloat16 API
>> which expects a float argument, thus our intent here is to perform unboxing
>> and boxing outside the intrinsic thereby avoiding all complexities around
>> boxing by compiler. Even if we pass 3 additional parameters we still need to
>> use Float16.floatValue which invokes Float.float16ToFloat underneath, thus
>> this minor modification on Java side is on account of optimizing the
>> intrinsic interface.
>
> Yes, i understand the approach. It's about clarity of the fallback
> implementation retaining what was expressed in the original code:
>
> short res = Float16Math.fma(fa, fb, fc, a, b, c,
> (a_, b_, c_) -> {
> double product = (double)(a_.floatValue() *
> b._floatValue());
> return valueOf(product + c_.doubleValue());
> });
Hi @PaulSandoz ,
In above code snippet the return type 'short' of intrinsic call does not
comply with the value being returned which is of box type, thereby mandating
addition glue code.
Regular primitive type boxing APIs are lazily intrinsified, thereby generating
an intrinsifiable Call IR during parsing.
LoadNode’s idealization can fetch a boxed value from the input of boxing call
IR and directly forward it to users.
Q1. What is the problem in directly passing Float16 boxes to FMA and SQRT
intrinsic entry points?
A. The compiler will have to unbox them before the actual operation. There are
multiple schemes to perform unboxing, such as name-based, offset-based, and
index-based field lookup.
Vector API unbox expansion uses an offset-based payload field lookup, for this
it bookkeeps the payload’s offset over runtime representation of VectorPayload
class created as part of VM initialization.
However, VM can only bookkeep this information for classes that are part of
java.base module, Float16 being part of incubation module cannot use
offset-based field lookup. Thus only viable alternative is to unbox using
field name/index based lookup.
For this compiler will first verify that the incoming oop is of Float16 type
and then use a hardcoded name-based lookup to Load the field value. This looks
fragile as it establishes an unwanted dependency b/w Float16 field names and
compiler implementation, same applies to index-based lookup as index values are
dependent onthe combined field count of class and instance-specific fields,
thus any addition or deletion of a class-level static helper field before the
field of interest can invalidate any hardcoded index value used by the
compiler.
All in all, for safe and reliable unboxing by compiler, it's necessary to
create an upfront VM representation like vector_VectorPayload.
Q2. What are the pros and cons of passing both the unboxed value and boxed
values to the intrinsic entry point?
A.
Pros:
- This will save unsafe unboxing implementation if the holder class is not part
of java.base module.
- We can leverage existing box intrinsification infrastructure which directly
forwards the embedded values to its users.
- Also, it will minimize the changes in the Java side implementation.
Cons:
- It's suboptimal in case the call is neither intrinsified or inlined, as it
will add additional spills before the call.
Q3. Primitive box class boxing API “valueOf” accepts an argument of the
corresponding primitive type. How different are Float16 boxing APIs.
A. Unlike primitive box classes, Float16 has multiple boxing APIs and none of
them accept a short type argument.
public static Float16 valueOf(int value)
public static Float16 valueOf(long value)
public static Float16 valueOf(float f)
public static Float16 valueOf(double d)
public static Float16 valueOf(String s) throws NumberFormatException
public static Float16 valueOf(BigDecimal v)
public static Float16 valueOf(BigInteger bi)
Thus, we need to add special handling to first downcast the parameter value to
short type carrier otherwise it will pose problems in forwarding the boxed
values. Existing LoadNode idealization directly forwards the input of unboxed
Call IR to its users. To use existing idealization, we need to massage the
input of unboxed Call IR to the exact carrier size, so it’s not a meager
one-line change in the following methods to enable seamless intrinsification of
Float16 boxing APIs.
bool ciMethod::is_boxing_method() const
bool ciMethod::is_unboxing_method() const
Given the above observations passing 3 additional box arguments to intrinsic
and returning a box value needs additional changes in the compiler while minor
re-structuring in Java implementation packed with in the glue logic looks like
a reasonable approach.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1914782512