On Sun, 13 Feb 2022 13:12:35 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>>> Hi, IIRC for evex encoding you can embed the RC control bit directly in the >>> evex prefix, removing the need to rely on global MXCSR register. Thanks. >> >> Hi @merykitty , You are correct, we can embed RC mode in instruction >> encoding of round instruction (towards -inf,+inf, zero). But to match the >> semantics of Math.round API one needs to add 0.5[f] to input value and then >> perform rounding over resultant value, which is why @sviswa7 suggested to >> use a global rounding mode driven by MXCSR.RC so that intermediate floating >> inexact values are resolved as desired, but OOO execution may misplace >> LDMXCSR and hence may have undesired side effects. > >> What does this do? Comment, even pseudo code, would be nice. > > Thanks @theRealAph , I shall append the comments over the routine. > BTW, entire rounding algorithm can also be implemented using Vector API > which can perform if-conversion using masked operations. > > class roundf { > public static VectorSpecies ISPECIES = IntVector.SPECIES_512; > public static VectorSpecies SPECIES = FloatVector.SPECIES_512; > > public static int round_vector(float[] a, int[] r, int ctr) { > IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127); > for (int i = 0; i < a.length; i += SPECIES.length()) { > FloatVector fv = FloatVector.fromArray(SPECIES, a, i); > IntVector iv = fv.reinterpretAsInts(); > IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000); > biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23); > IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, > biasedExpV); > VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32) > .compare(VectorOperators.EQ, 0); > IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF) > .lanewise(VectorOperators.OR, 0x007FFFFF + 1); > VectorMask cond1 = iv.compare(VectorOperators.LT, 0); > VectorMask cond2 = cond1.and(cond); > res = res.lanewise(VectorOperators.NEG, cond2); > res = res.lanewise(VectorOperators.ASHR, shiftV) > .lanewise(VectorOperators.ADD, 1) > .lanewise(VectorOperators.ASHR, 1); > res = fv.convert(VectorOperators.F2I, 0) > .reinterpretAsInts() > .blend(res, cond); > res.intoArray(r, i); > } > return r[ctr]; > } That pseudocode would make a very useful comment too. This whole patch is very thinly commented. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094