On Sun, 13 Feb 2022 13:08:41 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4066:
>> 
>>> 4064: }
>>> 4065: 
>>> 4066: void 
>>> C2_MacroAssembler::vector_cast_double_special_cases_evex(XMMRegister dst, 
>>> XMMRegister src, XMMRegister xtmp1,
>> 
>> What does this do? Comment, even pseudo code, would be nice.
>
>> Hi, IIRC for evex encoding you can embed the RC control bit directly in the 
>> evex prefix, removing the need to rely on global MXCSR register. Thanks.
> 
> Hi @merykitty ,  You are correct, we can embed RC mode in instruction 
> encoding of round instruction (towards -inf,+inf, zero). But to match the 
> semantics of Math.round API one needs to add 0.5[f] to input value and then 
> perform rounding over resultant value, which is why @sviswa7 suggested to use 
> a global rounding mode driven by MXCSR.RC so that intermediate floating 
> inexact values also are resolved as desired, but OOO execution may misplace 
> LDMXCSR and hence may have undesired side effects.

> What does this do? Comment, even pseudo code, would be nice.

Thanks @theRealAph , I shall append the comments over the routine.
BTW, entire rounding algorithm can also be implemented using  Vector API which 
can perform if-conversion using masked operations.

class roundf {
   public static VectorSpecies ISPECIES = IntVector.SPECIES_512;
   public static VectorSpecies SPECIES = FloatVector.SPECIES_512;

   public static int round_vector(float[] a, int[] r, int ctr) {
      IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127);
      for (int i = 0; i < a.length; i += SPECIES.length()) {
         FloatVector fv = FloatVector.fromArray(SPECIES, a, i);
         IntVector iv = fv.reinterpretAsInts();
         IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000);
         biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23);
         IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, biasedExpV);
         VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32)
               .compare(VectorOperators.EQ, 0);
         IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF)
               .lanewise(VectorOperators.OR, 0x007FFFFF + 1);
         VectorMask cond1 = iv.compare(VectorOperators.LT, 0);
         VectorMask cond2 = cond1.and(cond);
         res = res.lanewise(VectorOperators.NEG, cond2);
         res = res.lanewise(VectorOperators.ASHR, shiftV)
               .lanewise(VectorOperators.ADD, 1)
               .lanewise(VectorOperators.ASHR, 1);
         res = fv.convert(VectorOperators.F2I, 0)
               .reinterpretAsInts()
               .blend(res, cond);
         res.intoArray(r, i);
      }
      return r[ctr];
   }

-------------

PR: https://git.openjdk.java.net/jdk/pull/7094

Reply via email to