On Mon, 14 Feb 2022 09:12:54 GMT, Andrew Haley <a...@openjdk.org> wrote:
>>> What does this do? Comment, even pseudo code, would be nice. >> >> Thanks @theRealAph , I shall append the comments over the routine. >> BTW, entire rounding algorithm can also be implemented using Vector API >> which can perform if-conversion using masked operations. >> >> class roundf { >> public static VectorSpecies ISPECIES = IntVector.SPECIES_512; >> public static VectorSpecies SPECIES = FloatVector.SPECIES_512; >> >> public static int round_vector(float[] a, int[] r, int ctr) { >> IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127); >> for (int i = 0; i < a.length; i += SPECIES.length()) { >> FloatVector fv = FloatVector.fromArray(SPECIES, a, i); >> IntVector iv = fv.reinterpretAsInts(); >> IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000); >> biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23); >> IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, >> biasedExpV); >> VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32) >> .compare(VectorOperators.EQ, 0); >> IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF) >> .lanewise(VectorOperators.OR, 0x007FFFFF + 1); >> VectorMask cond1 = iv.compare(VectorOperators.LT, 0); >> VectorMask cond2 = cond1.and(cond); >> res = res.lanewise(VectorOperators.NEG, cond2); >> res = res.lanewise(VectorOperators.ASHR, shiftV) >> .lanewise(VectorOperators.ADD, 1) >> .lanewise(VectorOperators.ASHR, 1); >> res = fv.convert(VectorOperators.F2I, 0) >> .reinterpretAsInts() >> .blend(res, cond); >> res.intoArray(r, i); >> } >> return r[ctr]; >> } > > That pseudocode would make a very useful comment too. This whole patch is > very thinly commented. > > Hi, IIRC for evex encoding you can embed the RC control bit directly in the > > evex prefix, removing the need to rely on global MXCSR register. Thanks. > > Hi @merykitty , You are correct, we can embed RC mode in instruction encoding > of round instruction (towards -inf,+inf, zero). But to match the semantics of > Math.round API one needs to add 0.5[f] to input value and then perform > rounding over resultant value, which is why @sviswa7 suggested to use a > global rounding mode driven by MXCSR.RC so that intermediate floating inexact > values are resolved as desired, but OOO execution may misplace LDMXCSR and > hence may have undesired side effects. **Just want to correct above statement, LDMXCSR will not be re-ordered/re-scheduled early OOO backend.** ------------- PR: https://git.openjdk.java.net/jdk/pull/7094